Informative and Uninformative Regions Detection in WCE Frames

advertisement
Columbia International Publishing
Journal of Advanced Computing
(2014) Vol. 3 No. 1 pp. 12-34
10.7726/jac.2014.1002a
Research Article
Informative and Uninformative Regions Detection
in WCE Frames
Omid Haji Maghsoudi1*, Alireza Talebpour2 , Hamid Soltanian-Zadeh3 , Mahdi Alizadeh4 ,
and Hossein Asl Soleimani5
Received 3 February 2014; Published online 8 March 2014
© The author(s) 2014. Published with open access at www.uscip.us
Abstract
Wireless capsule endoscopy (WCE) is relatively a new device which investigates the entire gastrointestinal (GI).
About 55000 frames are captured during an examination (two frames per second). Thus, it is bene icial to ind
an automatic method to detect diseases frames or regions of a frame. The WCE videos have lots of uninformative
regions (such as intestinal juice, bubbled, and dark regions); therefore, preprocessing is useful and necessary
in diseases detection. In this paper, three practical methods are introduced to detect the informative and uninformative regions in a frame. In order to achieve this goal, morphological operations, fuzzy k-means, sigmoid
function, statistic features, Gabor ilters, Fisher test, neural network, and discriminators in the HSV color space
are used to detect uninformative regions (do not carry clinical information) in a frame. Our experimental results indicate that precision, sensitivity, accuracy, and speci icity are respectively 97.76%, 97.80%, 98.15%, and
98.40% in the irst method, 93.32%, 84.60%, 91.05%, and 95.67%, respectively in the second one, and 93.32%,
84.60%, 91.05%, and 95.67%, respectively in the third method.
Keywords: Wireless Capsule Endoscopy; Gabor ilter; Haralick Features; Laplacian of Gaussian; Morphology;
Neural Network
1. Introduction
Wireless capsule endoscopy (WCE) is a method which is used to investigate the GI tract and small bowel. Traditional endoscopies enable physicians to view both ends of the GI tract (Iddan et al., 2000). The
wireless capsule endoscope can be used to detect diseases in the small bowel (Fritscher-Ravensand
and Swain, 2001) that are not easily detected by ordinary endoscopy.
*
Corresponding email: o.maghsoudi@hotmail.com
1 Department of Medical Radiation Engineering, University of Shahid Beheshti, Tehran, Iran
2 Department of Electrical and Computer Engineering, University of Shahid Beheshti, Tehran, Iran
3 Control and Intelligent Processing Center of Excellence (CIPCE), School of Electrical and Computer Engineering, University
of Tehran, Tehran, Iran; and Medical Image Analysis Lab., Henry Ford Health System, Detroit, Michigan, USA
4 Department of Bioengineering, Temple University, Philadelphia, USA
5 Digestive Disease Research Institute, Shariati Hospital, Tehran, Iran
12
Omid Haji Maghsoudi, Alireza Talebpour, Hamid Soltanian-Zadeh, Mahdi Alizadeh, and Hossein Asl Soleimani /
Journal of Advanced Computing (2014) Vol. 3 No. 1 pp. 12-34
To reduce reviewing time and increase the accuracy rate for automatic detection of abnormalities (diseases), it is bene icial to ind unimportant regions of a frame which do not carry clinical information
(uninformative). The process of elimination of uninformative regions (dark or bubbled regions) in a
frame can make impact on detection of abnormalities. Moreover, setting a threshold by physicians can
reduce the captured frames based on a research; for example, a frame contained more than 70% uninformative regions can be named as uninformative frame in a video and it can be omitted. The latest
advantage (select a threshold to omit some frames) can help physicians to ind results more ef iciently
compared with a method that omits frames based on a constant procedure or constant parameters;
on the other hand, it can be useful for WCE manufactures to ind the best threshold based on their
own software.
In addition to these bene its for physicians and manufactures, using lexible threshold can help us to
ind some special cases, for example extraneous matters called uninformative regions in detection of
abnormalities while the detection of extraneous matters may be helpful in other studies (Oh et al.,
2007).
Methods proposed by Oh et al. (2007); Vilarino et al. (2006a); Vilarino et al. (2006b); Bejakovic et al.
(2009); and Bashar et al. (2012) signify the importance of uninformative regions detection in WCE or
colonoscopy videos. These researchers examined the methods to reduce the visualization time while
the main purpose of this study is to devise a method to detect uninformative regions in a frame.
An automatic method was devised by Oh et al. (2007) to reduce number of uninformative frames (out
of focus frame) in traditional endoscopy video (colonoscopy).
A method was proposed by Vilarino et al. (2006a, 2006b) to decrease the number of frames based on
Gabor ilters that characterized the bubble-like shape (Vilarino et al., 2006a). The proposed method
reduced the visualization time (reduced frames full of juice or bubble regions). As they reported, their
proposed algorithm reduced the frames by 20%, but they did not mention any measures for separated
regions in a frame (the focused on frame work study). In another word, the introduced method omitted frames, but it did not distinguish between informative and non-informative regions in a frame.
Using the ROC (receiving operating characteristic) curve with eight classi iers was studied by Vilarino
et al. (2006b). The individual classi iers were used as follows: linear discriminant classi ier (LDC),
quadratic discriminant classi ier (QDC), logistic classi ier (LOGLC), nearest neighbour (K-NN) with K =
1, 5, 10, decision trees (DT), and Parzen classi ier. 34 features for nine frames were extracted (uninterrupted frames). Features were mean (1 feature × 9 frames = 9 features), the hole size of frames (9 features), global contrast of each frames (9 features), correlation among these features (6 features), and
variance among the frames. Their experimental results indicated that the proposed methods could
reduce the visualization time.
Our presented methods take this fact into consideration that they can distinguish informative and
uninformative regions in a frame. Although these the methods proposed by Vilarino et al. (2006a,
2006b) had good results for omitting frames contained juice, no impressing results were found for air
bubble detection (as they do not mentioned).
13
Omid Haji Maghsoudi, Alireza Talebpour, Hamid Soltanian-Zadeh, Mahdi Alizadeh, and Hossein Asl Soleimani /
Journal of Advanced Computing (2014) Vol. 3 No. 1 pp. 12-34
Bejakovic et al. (2009) reported average sensitivity and accuracy for extraneous matter and bubble
detection using different methods: 76.50% and 87.30% using MPEG-7 dominant color descriptor,
68.60% and 77.70% using edge histogram descriptor, 65.10% and 88.90% using homogenous texture descriptor, and 61.30% and 77.70% using Haralick statistics.
A method was introduced by Bashar et al. (2012) to detect the informative frames. The authors used two steps in their method: the irst step was to isolate highly contaminated non-bubbled (HCN) frames,
and the second step was to signify bubbled frames. They used local color moments and the HSV color histogram, which characterized HCN frames. Then, a support vector machine (SVM) was applied
to classify the frames. In second step, a Gauss Laguerre transform (GLT) (based on texture feature)
was used to isolate the bubble structures. Finally, informative frames were detected using a threshold
on the segmented regions. Feature selection was proposed using an automatic procedure based on
analyzing the consistency of the energy-contrast map. The combinations of their proposed color and
texture features showed average detection accuracies (86.42% and 84.45%). Same as the previous
studies, their study was limited to frame reduction.
In this paper, three methods are introduced to distinguish among informative and uninformative regions in a frame; the visualization time is not reduced, but it is possible to reduce frames (as we study it to compare with the previous methods). Moreover, as mentioned above, the methods can increase the accuracy of other automatic methods such as automatic detection of different organs by
Haji-Maghsoudi et al. (2012); Mackiewicz et al. (2008), bleeding regions by Li and Meng (2009); Pan
et al. (2010); Haji-Maghsoudi and Soltanian-Zadeh (2013), and tumor regions by Haji-Maghsoudi and
Soltanian-Zadeh (2013); Kumar et al. (2012); Li and Meng (2012); Haji-Maghsoudi et al. (2012).
Pan et al. (2011) introduced a method based on color similarity measurements to ind bleeding regions in a WCE frame. They also introduced another bleeding detection method (Pan et al., 2010)
using probabilistic neural network. Li and Meng (2009) presented a method based on chrominance
moment as a color feature to detect bleeding regions in a WCE frame. Junzhou et al. (2011) presented
a method to detect bleeding regions based on contourlet features. Karargyris and Bourbakis (2009)
illustrated a method using Gabor ilter, color and texture features, and a neural network to detect ulcers regions in the WCE frames. Karargyris and Bourbakis (2011) also developed another method to
detect ulcer and polyp frames in the WCE videos.
A method was introduced by Kumar et al. (2012) to detect the frames contained Crohn's disease.
Haji-Maghsoudi et al. (2012) proposed a method to segment abnormal regions in a frame based on
the intensity value. In addition, Haji-Maghsoudi and Soltanian-Zadeh (2013) presented a method using LFP for detection of frames containing diseases regions.
In addition to the researches mentioned above, the methods proposed by Mackiewicz et al. (2008);
Coimbra and Silva Cunha (2006); Szczypinski et al. (2012) developed combining geometry, color, and
texture features in WCE frame analysis.
A method was proposed by Mackiewicz et al. (2008) to discriminate between esophagus, stomach,
14
Omid Haji Maghsoudi, Alireza Talebpour, Hamid Soltanian-Zadeh, Mahdi Alizadeh, and Hossein Asl Soleimani /
Journal of Advanced Computing (2014) Vol. 3 No. 1 pp. 12-34
small intestine, and colon tissue in WCE videos. The authors used a combination of geometry, color,
and texture features to analyze the WCE frames. Compressed hue-saturation histograms were used
to extract color features; LBP was used to extract color and texture features; each frame was divided
to sub images to de ine the region of interest for features extraction; and motion features using adaptive rood pattern search were extracted from images. Then, the video was segmented into meaningful
parts using support vector classi ier and multivariate Gaussian classier built within the framework of
a hidden Markov model.
Haji-Maghsoudi et al. (2012) presented two methods to distinguish different organs based on texture
and color features in the WCE videos. The proposed methods for organs detection may be improved
using the methods presented in this study.
As mentioned above, the previous researchers studied methods for reduction of visualization time
that help physicians to review the WCE videos easier. Although the visualization time is important
issue in this device, omitting uninformative regions in a frame can help physicians to study based on
a case study, manufactures to improve their software, and researchers to improve the results of their
automatic methods; therefore, we propose the methods obtained this goal.
Three novel methods are proposed to detect bubbles and juices regions in the WCE frames. In Section
II, the methodologies and functions, which are optimized and utilized in this study, are explained. Our
experimental results are also illustrated in Section III and in Section IV, advantage and disadvantage
points in our methods are deduced.
2. The Proposed Algorithms
In this section, we describe the sequences are used to devise three methods.
2.1 First Method; Using Morphological Operations
In the irst method, the following sequences are used:
1. First, the gray scale image is generated from the RGB frame. Then, morphological operations
are applied on the gray scale frame. Mathematical morphology (MM) operations are techniques
and theories for the analysis and processing of geometrical structures, based on topology and
random functions. MM operations were originally developed for binary images and were later
extended to the gray scale functions and images (Harlick et al., 1987). The subsequent generalization to complete lattices is widely accepted today as MM theoretical foundation. In this paper,
the dilation and closing operations are applied on the gray scale image. Four dilation and four
closing operations are used by 13, 11, 9, and 7 pixels. To smooth the image after using MM operations, median ilters (Lim, 1990) with 25×25 and 35×35 pixels as the window size are used.
Figure 1 shows the effect of this ilter on sample frames.
2. The fuzzy k-means clustering (FKM) clustering with ive clusters is applied to ind the lighter regions in a frame (as demonstrated in Figure1). FKM algorithm performs iteratively until achiev15
Omid Haji Maghsoudi, Alireza Talebpour, Hamid Soltanian-Zadeh, Mahdi Alizadeh, and Hossein Asl Soleimani /
Journal of Advanced Computing (2014) Vol. 3 No. 1 pp. 12-34
Fig. 1. Column (a) shows the original frames, (b) is the effect of morphological operation on the frame
from column (a), (c) is the median ilter of (b), (d) is the fuzzy k-means by ive clusters on the column
(c), and (e) shows applying sigmoid function on the frame (c) after using the irst neural network.
16
Omid Haji Maghsoudi, Alireza Talebpour, Hamid Soltanian-Zadeh, Mahdi Alizadeh, and Hossein Asl Soleimani /
Journal of Advanced Computing (2014) Vol. 3 No. 1 pp. 12-34
Fig. 2. The effect of sigmoid function by different parameters. In column (a) are the original images, (b)
is the input image of sigmoid function from (a). The columns (c), (d) and (e) are the result of applying
sigmoid functions with Cut-off = 0.95, 0.75, 0.55 and Gain = 50, respectively.
ing absolute classes (each point has to be in a class). The applications of FKM was described by
Dring et al. (2006). We have to indicate that most of bubbled regions are relativity lighter and
these lightening are also increased by applying the morphological operations in the previous
step (increasing the round shape objects like bubbles). Five clusters are generated to distinguish among the borders (the darkest class or black pixels), dark regions (which are not carry important information in the next processes), tissue with mediocre intensities, light regions
(usually normal or diseases regions are included in these two latest groups), and the lightest
regions (usually bubbled regions) in a frame after applying the MM operations. Regarding the
clustering, the average intensity value of pixels for each cluster is calculated (by the logical AND
operation, the same regions in the output image after applying morphological process are extracted). Then, a neural network (Leondes, 1998) with a hidden layer and an output layer is
used to divide images into two groups: irst group, the images contained no round objects, and
second group, the images contained bubbles, juice, or other round objects.
3. Now, a sigmoid function is used to segment the region of interest for the next processes based
on the irst neural network output. Sigmoid function is a continuous non-linear function. The
name, sigmoid is derived from the fact that the function is shaped (Ramesh et al., 1995). Using
f(x) for the output and G(x) as a gain, so:
f (x) =
1
1 + eG(x)
(1)
Here, a new form of sigmoid function is used with additional parameters, which I(i,j) is the input
image without any process, J(i,j) is the output image, Gain is a parameter for changing slope, and
Cutoff is another parameter to shift data,
J(i, j) =
1
1+
eGain×(Cutof f −I(i.j))
(2)
Based on the sigmoid function, the input matrices values are resorted between zero and one;
however, the output image is rescaled to the gray scaled range [0 255] after applying the sigmoid function. In addition, the usage of the sigmoid function was illustrated by Haji-Maghsoudi
et al. 2012 for diseases segmentation in WCE frames.
17
Omid Haji Maghsoudi, Alireza Talebpour, Hamid Soltanian-Zadeh, Mahdi Alizadeh, and Hossein Asl Soleimani /
Journal of Advanced Computing (2014) Vol. 3 No. 1 pp. 12-34
Regarding to the step 2, the sigmoid function is applied by the following parameters (depend on
the classes):
• Gain = 50, Cutoff = 0.55 have round object
• Gain = 50, Cutoff = 0.95 does not have round object
These processes effects are demonstrated in Figure 1, and Figure 2 shows how the sigmoid function can change the segmented regions by changing of parameters.
Then, the image is divided to 256 sub images with 32×32 pixels in order to achieve more accurate
results.
4. The Gabor ilter (G function) is same as a sinusoidal plane of particular frequency and orientation modulated by a Gaussian envelope before convolving with images. This ilter has good
localization properties in both spatial and frequency domains and has been used for texture
segmentation by Jain et al. (1997); Clausi and Jernigan (2000); Bovik et al. (1990). The impulse
response of the 2D Gabor ilter is
G=
2
2
1
−( X 2 + Y 2 )
× e 2σx 2σy × ei(2πf X+ϕ)
2πσx σy
(3)
sX = x × cos(θ) + y × sin(θ)
(4)
Y = −x × sin(θ) + y × cos(θ)
(5)
where
and θ is the rotation angel of the impulse response, x and y are the coordinates, σx and σy are
the standard deviations of the Gaussian envelope in the x and y directions, respectively, f and ϕ
are the frequency and phase of the sinusoidal, respectively.
A ilter bank is created by 24 Gabor ilters which are mentioned bellow (combination of parameters):
f = 0.25, 1, and 3; θ = 0, 45, 90, and 135; ϕ = 0;
x and y ranges = 10; sigmax = 1 and sigmay = 2; sigmax = 1 and sigmay = 1.
The Gabor ilter parameters were selected based on experiments, although the method presented by Bovik et al. (1990) inspired us to choose them easier.
5. After using the Gabor ilter bank, the Haralick features (14 features) are extracted using the gray
level co-occurrence matrix (GLCM) as follows:
Contrast, correlation, entropy, energy, difference variance, difference entropy, information measure of correlation1, information measure of correlation2, inverse difference, sum average, sum
18
Omid Haji Maghsoudi, Alireza Talebpour, Hamid Soltanian-Zadeh, Mahdi Alizadeh, and Hossein Asl Soleimani /
Journal of Advanced Computing (2014) Vol. 3 No. 1 pp. 12-34
Fig. 3. Some inal frames in the irst method. Rows (a) and (c) are the original images without any
process and rows (b) and (d) are the output image after applying the all processes on the original
images.
variance, sum of squares, sum entropy, maximum correlation coef icient (14 Haralick features)
(Haralick et al., 1973).
Also, other statistical features are extracted based on GLCM, such as:
Autocorrelation, cluster prominence, cluster shade, dissimilarity, homogeneity, maximum probability, inverse difference normalized, and inverse difference moment normalized (Soh and Tsatsoulis, 1999; Clausi, 2002). Three other features are mean, skewness, and kurtosis extracted
from the gray scaled image. In this paper, GLCM features signify these 25 features (14 Haralick
features and 11 other statistic features).
Therefore, 25 features for each Gabor ilter produce 600 features (for a sub image), and 100
features are extracted using the GLCM features from the gray scale frame in four different angles
(0, 45, 90, and 135). As a result, a matrix with n×700 features is created that n depends on
number of sub images (with a maximum of 256 sub images).
6. The Fisher test (Gu et al., 2012) is used to justify our features and choose 100 features for each
sub image (the features are normalized between 0 and 1 before applying the Fisher test). Then,
the second neural network is trained by the features selected after applying the Fisher test.
Therefore, each sub image can be decided whether is informative or uninformative.
7. The bubbled regions are recognized in a frame (the resolution is 32 ×32) using the steps mentioned above. Now, a threshold on hue, saturation, and intensity value is applied (Naik and
Murthy, 2003) to omit the dark regions. In the inal step, a median ilter is applied to smooth
the segmented regions (this time, the median ilter window size is 15). The gray scale image is
converted to binary, and then the 'AND' function is used to omit dark regions in the RGB frame.
This step is illustrated in Figure 10.
• Hue ≥ 50.4, Saturation ≥ 0.75, Value ≤ 0.5
19
Omid Haji Maghsoudi, Alireza Talebpour, Hamid Soltanian-Zadeh, Mahdi Alizadeh, and Hossein Asl Soleimani /
Journal of Advanced Computing (2014) Vol. 3 No. 1 pp. 12-34
• Hue ≥ 90, For All Saturation, For All Value
• For All Hue, For All Saturation, Value ≤ 0.25
The HSV discriminators ranges are studied in the results Section.
To justify the irst algorithm, the following factors should be considered: MM operations are designed
to emphasize on the round objects in a frame; median ilter is used to smooth the frame not to reduce
noise; fuzzy K-mean is used to classify pixels in a frame to ive clusters; sigmoid function is an effective
method to signify the selected region after clustering; texture features are extracted from sub images
and they are used for the neural network inputs while Fisher score test is used to optimize the features (to ind the best ones); HSV discriminator plays a signi icant role to omit dark parts in a frame.
Because the HSV discriminator and sub images determine the neural network resolution in detection
of regions, Median ilter is used to smooth the output image.
Fig. 4. The left chart is the irst proposed algorithm, and the right one shows the second algorithm.
2.2 Second Method Using the LOG ilter
In the second algorithm, the Laplacian of Gaussian (LOG) ilter is used (it is the irst time that LOG is
used for bubble-shape recognition). The Gaussian can be used for edge detection procedure. First,
the Gaussian function is smooth and localized in both the spatial and frequency domains, providing a
good compromise between the need for avoiding false edges and for minimizing errors in edge position (Marr and Hildreth, 1980). In fact, Torre and Poggio (1986) described the Gaussian as the only
real valued function which minimizes the product of spatial domain and frequency domain spreads.
The Laplacian of Gaussian essentially acts as a band pass ilter because of its differential and smoothing behavior. Second, the Gaussian is separable which helps to make computation very ef icient.
Moreover, recently, LOG was used to detect blob in microscopic images which shows how it can be
useful for detection of round objects (Akakin and Sarma, 2013). The second algorithm is described in
details as following steps:
20
Omid Haji Maghsoudi, Alireza Talebpour, Hamid Soltanian-Zadeh, Mahdi Alizadeh, and Hossein Asl Soleimani /
Journal of Advanced Computing (2014) Vol. 3 No. 1 pp. 12-34
1. The same HSV discriminator is applied by the same ranges in the irst method. After applying
this discriminator on a frame, the median ilter is used with the same window size of the irst
method. HSV discriminator is used in the irst step, because it reduces the time of the next processes. In the irst method, the HSV discriminator could not be used as irst step because it can
make impact on the extracted features using Gabor ilter which are not used here (Gabor ilters
extract the texture and shape features, but these discriminators can effect on the shape).
Fig. 5. Row (a) is the original frames, frames show the LOG effect on the (a) by sigma=0.09 in row (b),
frames show LOG by sigma=0.3 in row (c).
Fig. 6. Some inal frames after using of the second methods. Row (a) shows the results of applying
different processes on a sample frame (from left to right: original, HSV discriminations and median
ilter, LOG, after neural network (3 images), again HSV discrimination and median ilter). Row (b)
shows eight original images that the output images are shown in Row (c).
2. The gray scaled histogram is equalized. Zimmerman et al. (1988) described the histogram
equalization advantages for image processing usages; moreover, Cromartie and Pizer (1991)
demonstrated the effectiveness of adaptive histogram equalization for edge detection.
3. It is time to use the LOG ilter by sigma = 0.09. Figure 5 shows how the effectiveness of this
function on frames.
4. Same as the irst method, sub images by resolution of 32×32 are created; but unlike the irst
algorithm, the preprocessing steps (in the irst method, we focused on bubbled regions using
morphological operation, FKM, and sigmoid function) and the Gabor ilter are not used here;
21
Omid Haji Maghsoudi, Alireza Talebpour, Hamid Soltanian-Zadeh, Mahdi Alizadeh, and Hossein Asl Soleimani /
Journal of Advanced Computing (2014) Vol. 3 No. 1 pp. 12-34
therefore, the GLCM features are extracted for each sub image, 25 features from the original
frame and 25 features after applying LOG.
5. Now, a neural network is trained regarding these ifty features. The second algorithm scheme is
illustrated in Figure 4.
The methods used in the second algorithm are used for the same reasons which were mentioned in the
irst algorithm (LOG is used instead of MM, Fuzzy K-Means, and Sigmoid function), expect two additional changes which are Histogram equalization and LOG. Histogram equalization is used to enhance
the gray scale frames, it helps LOG to ind edges better; LOG is the main detector of bubble parts in a
frame.
2.3 Third Method Using Chan-Vese Active Countor
Chan and Vese proposed models using original Mumford-Shah to segment inhomogeneous objects
(Truc et al., 2011; Li et al., 2011; Chan and Vese, 2001; Samson et al., 2000; Zhang et al., 2010). They
used optimization processes for discontinuities in form of a curve evaluation problem involve level
set method and solving Poison partial differential equations. Although, these methods well represent
the objects of interest, this process is very complicated and requires a good initialization Truc et al.
(2011). The Chan-Vese model is a region based active contour. Therefore, it does not utilize image
gradient and has a better performance for the image with weak object boundaries. Moreover, this
model is less sensitive to the initial contour; however, identifying initial contour near the object reduces computational cost (Li et al., 2007).
The third algorithm is described in details as following steps:
Fig. 7. The schematic of third method.
1. A threshold is de ined to ind which frames have black spots. These black spots are suspicious
for being bubbled regions. As illustrated in Figure 8, the block spots are pervaded in the frames
contained bubbled regions more than frames without any uninformative regions.
22
Omid Haji Maghsoudi, Alireza Talebpour, Hamid Soltanian-Zadeh, Mahdi Alizadeh, and Hossein Asl Soleimani /
Journal of Advanced Computing (2014) Vol. 3 No. 1 pp. 12-34
Fig. 8. Black spots are shown in negative images.
2. If the black spots are found, the Chan-Vese active contour is used to determine the bubbled regions in a frame. This process is demonstrated in Figure 9. Initial seeds are found for the ChanVese active contour using the black spots shown in the negative image transformation. The effect of black spots as initial seeds is demonstrated in Figure 9. In this paper, the initial seed of
contour is de ined by the median point of the generated vector from black spots in a frame.
3. The HSV discriminator is used to omit dark regions.
The overview of this algorithm is illustrated in Figure 7. The number of dark pixels is counted for
all the frames that proves this number is higher in frames contained bubbled regions compared with
frames without bubbled regions, as it is illustrated in Table I.
Fig. 9. The effect of initial seed in using Chan-Vese active contour. The left image shows a sample frames
result without using the black spots and the right one demonstrates the results of using black spot as
the initial seed.
23
Omid Haji Maghsoudi, Alireza Talebpour, Hamid Soltanian-Zadeh, Mahdi Alizadeh, and Hossein Asl Soleimani /
Journal of Advanced Computing (2014) Vol. 3 No. 1 pp. 12-34
Table 1. Quantitive comparision of dark pixels in presence and without presence of bubbles.
3. Experiments and Results
The Given Imaging PillCam SB videos are collected from the Given Imaging website (givenimageing.com). First, each video is digitized into 40 frames (the recording time is 20 seconds and the device
captures two frames per second). The frame size is 576×576, the size is reduced to 512×512 (omit
borders). In the proposed methods, each frame is divided to sub images with 32×32 pixels that 256
sub images are generated form a frame. Totally, 60 videos are selected that 2400 frames are extracted
from these videos (984 frames contained bubbled regions).
A question must be clari ied: What are air bubble and juice?
In Figure 1, an example of frame contained air bubble regions is the row 6 and an example of frame
contained juice contained is the row 2. If a frame contained bubbled regions has more than 5000 Pixels, it counts as a frame contained air bubble regions. Detection of this type of bubble is more dif icult
because it is more similar to the informative regions (especially in the second and third method). If
the frame has fewer pixels than 5000, it counts as a frame contained juice in this study (so juice is
intestinal juice and little air bubble).
The specialist (Dr. Soleimani) distinguishes among these two regions (informative and uninformative) in the WCE frames. Feed forward back propagation multilayer neural network is used with three
hidden layers (only for the second network) in the irst and second methods. In the second neural
network of irst method, input neurons are one hundred features, 39 neurons are designed for the
irst hidden layer, the second hidden layer designed by 15 neurons, the third layer has 6 neurons, and
the output layer has two neurons to classify the regions to two separated classes (informative and
uninformative regions).
To de ine number of hidden layers, six numbers between 20 and 60 for the irst hidden layer (this number is called X), ive numbers between 0.2×X and X×0.6 for the second layer (this number is called Y),
and three numbers between 0.2 ×Y and Y ×0.6 are selected using random sub-sampling cross validation (we try to choose logical range by setting 0.2 and 0.6 which helps us to reduce our calculation).
At end, the numbers of neurons are chosen based on the best neural network performance. The best
24
Omid Haji Maghsoudi, Alireza Talebpour, Hamid Soltanian-Zadeh, Mahdi Alizadeh, and Hossein Asl Soleimani /
Journal of Advanced Computing (2014) Vol. 3 No. 1 pp. 12-34
Table 2. Accuracy, sensitivity, speci icity, and precision of the methods (total uninformative parts).
Table 3. Accuracy, sensitivity, speci icity, and precision of the methods (air bubbled and dark parts).
performance among neural networks is 0.016 in the irst method. In the irst network of irst method,
a simple network is used with a hidden layer which has 5 neurons. The Accuracy, Precision, Sensitivity
and Speci icity are measured as follows:
Sensitivity =
TP
TP + FN
(6)
Specif icity =
TN
TN + FP
(7)
Accuracy =
TP + TN
TP + FN + TN + FP
(8)
TP
(9)
TP + FP
where TP, FN, TN, and FP respectively denote the number of pixels in an unimportant region that
are correctly labeled, the number of pixels in the unimportant region that are incorrectly labeled as
important, the number of pixels in important regions that are correctly labeled as important and the
number of pixels in important region that are incorrectly labeled as unimportant region (bubble, juice
and dark regions) (Altman and Bland,1994). The accuracy, precision, sensitivity, and speci icity are
computed for the methods and illustrated in Table II.
P recision =
As mentioned in Section 2, the HSV discriminator ranges can in luence on the measures. Here, the
effects of some HSV discriminators mentioned in Table IV are studied; also, these effects are illustrated
in Figure 10. The chosen discriminator is divided to three different ranges as (7), (8), and (9). The
irst range (7) shows the main discriminator for omitting intestinal juice in the GI tract, the second
one (8) is used for omitting the extraneous matters which have very different color values, and the
third (9) is used for omitting the dark regions by controlling of intensity value.
As it is clear in Figure 10, hue determines the color among different objects in a frame. Usually, the
normal tissues have the hue value lower than 0.1. The second, third, and fourth rows in Figure 10
(especially the fourth row) demonstrate that most of informative tissues are remained after using a
25
Omid Haji Maghsoudi, Alireza Talebpour, Hamid Soltanian-Zadeh, Mahdi Alizadeh, and Hossein Asl Soleimani /
Journal of Advanced Computing (2014) Vol. 3 No. 1 pp. 12-34
Table 4. The HSV discriminators.
Table 5. The tested values for selection the best sigmoid function cut off value and their related results.
26
Omid Haji Maghsoudi, Alireza Talebpour, Hamid Soltanian-Zadeh, Mahdi Alizadeh, and Hossein Asl Soleimani /
Journal of Advanced Computing (2014) Vol. 3 No. 1 pp. 12-34
discriminator by hue higher than 0.1. Therefore, the procedure of inding the best range is started
by setting Hue = 0.1. In order to optimize the ranges, the best value is found by selecting a pace and
following the instruction in Figure 11. First, the best value for hue channel, then saturation channel,
Fig. 10. The irst row shows the original frames, the remained parts after using discriminators one
to seven are demonstrated from the second to the eighth rows, respectively. The last row shows the
remained parts after applying the chosen discrimination on the original frames.
and at end value channel are selected. As discussed, the set point and pace of hue value are 0.1 and
0.01, respectively. The same algorithm is applied to determine the saturation and value parameters.
The set points are 0.6 and the paces are 0.1. The results of these processes are shown in Figure 12.
The algorithm to ind the best HSV discriminator is applied to select the irst and third ranges as the
results illustrated in Figure 12. The second range is used to omit some extraneous matters and in this
study is not important and effective. Moreover, the same algorithm is applied to select the median
ilters window size which 25 and 35 pixels are chosen.
For selecting the sigmoid functions parameters (Cutoff and Gain), again by using cross validation is
used and the chosen parameters indicate the best performance. For selecting the sigmoid functions
parameters (Cutoff and Gain). Regarding to the equation (3), the Gain is not a determinant factor compared with the Cutoff value; by a few tests, it is clear that Gain (higher than one) can be a constant,
so it is selected by Gain = 50, and then the effect of Cutoff value is studied. The algorithm (illustrated
in Figure 11) is applied to select the best number for Cutoff. The pace is 0.1 and two set points are
de ined by using random sub-sampling cross validation. One point, in the selection of uninformative
regions, is important (for measuring); the uninformative regions. Therefore, the sensitivity and speci27
Omid Haji Maghsoudi, Alireza Talebpour, Hamid Soltanian-Zadeh, Mahdi Alizadeh, and Hossein Asl Soleimani /
Journal of Advanced Computing (2014) Vol. 3 No. 1 pp. 12-34
icity are calculated. The sensitivity is gained by 3 and the speci icity by 1; then, the average number
determines the best number for Cutoff, the procedure is illustrated in Table VI. The next parameter is
Fig. 11. The proposed instruction for inding the best values for the parameters.
Fig. 12. The results during procedure of selecting the best HSV discrimination. The black, green, and
blue points illustrate the result of hue, saturation, and value in the irst range based on method in
Figure11. The red ones show the results during inding of the best value for the third range.
the sigma which was used in the second method (LOG parameter). Again by using the algorithm for
parameter optimization (Figure 11), two set points (0.08 and 0.2) are selected and the pace is 0.02.
The best value for LOG is 0.09, but most of the numbers can signify the bubbled regions very well and
the differences are very low (less than 0.002 changes in sensitivity). This note is important that if
sigma (LOG parameter) is higher than 0.5 the output image is completely dark (do not segment suspected regions), and also if sigma is lower than 0.05, it would be dif icult to ind bubbled regions; so
it is possible to obtain to good results if sigma is higher than 0.05 and lower than 0.5.
The last parameter is the median ilter. It was used two times in both methods. In the irst method, it
was used to smooth frame after using morphological operation and then to smooth result after using
28
Omid Haji Maghsoudi, Alireza Talebpour, Hamid Soltanian-Zadeh, Mahdi Alizadeh, and Hossein Asl Soleimani /
Journal of Advanced Computing (2014) Vol. 3 No. 1 pp. 12-34
HSV discriminators. In the second method, it was applied after using HSV discriminator and at the
end of processes after using neural network. In the case of the irst method, the median ilter was designed by window size of 35 ×35. It was used again by 15×15 as window size. In the second method,
two median ilters were formed by 15×15. So, to select the best window size, the same algorithm
was used (as illustrated in Figure 11). Random sub-sampling cross validation, for selecting set points,
is used to determine two set points lower than 100 and pace is 10. For optimization of parameters,
sequences were used to obtain to the best result. In the irst method, the irst median ilter was designed, sigmoid function was the second parameter (sensitivity is more important), the last median
ilter was optimized without considering HSV discriminator and on the neural network output frame
(both sensitivity and speci icity are important), and HSV ranges were the last parameter (both sensitivity and speci icity are important). One more point is also important that the irst median ilter
window size was selected by random sub-sampling cross validation between 10 and 50, although the
sigmoid parameters were determinative factor on the results. The last median ilter does not change
the measures, but it affects on edges and improves visualization by smoothing sub images; in irst row
of Figure 6, columns seventh and eighth, these effects are demonstrated. One question is remained
Table 6. The tested values for selection the best sigmoid function cut off value and their related results.
unanswered: why are two times MM operations used? Each time that MM operations are applied the
bubbled parts are magni ied. The irst time emphasizes and magni ies the small bubbles, and the second time emphasizes on the round objects after applying the irst time. The second time of using MM
operations helps us to be sure that all bubbled parts are selected while after the irst time the some
small bubbles are not large enough for the next detections.
To compare our study with the previous studies presented for frame reduction not detection of bubbled regions in a frame, 100 frames are tested to examine the accuracy of the method for frame reduction. Therefore, a threshold is set to classify the frames using the methods. If the bubbled regions in a
frame is more than 90% of total pixels (more than 235930 pixels), the frame is totally uninformative.
The results are illustrated in Figure 13. In selection of the data set for frame reduction, we select 50
frames for each class (informative and uninformative frames).
29
Omid Haji Maghsoudi, Alireza Talebpour, Hamid Soltanian-Zadeh, Mahdi Alizadeh, and Hossein Asl Soleimani /
Journal of Advanced Computing (2014) Vol. 3 No. 1 pp. 12-34
Fig. 13. The results of frame reduction. (a), (b), (c), and (d) show respectively the frames classes
which should be classi ied using the methods, the classi ied frames using the irst method, the classi ied
frames using the second method, and the classi ied frames using the third method.
4. Conclusion
In this paper, two new methods were presented for automatic bubble and juice detection. The proposed methods were focused on the detecting of uninformative regions in a frame. In the previous
studies (Vilarino et al., 2006a, 2006b; Li and Meng, 2012; Haji Maghsoudi and oltanian-Zadeh, 2013),
some methods were proposed to reduce the frames number in a captured video. The obtained precision, sensitivity, accuracy, and speci icity for segmenting the intestinal uninformative regions for the
irst method are 97.76%, 97.80%, 98.15% and 98.40% respectively, 93.32%, 84.60%, 91.05%, and
95.67% in the second method, and 93.32%, 84.60%, 91.05%, and 95.67% in the third method. As it is
clear in Figure 14, the second method has lower accuracy and precision; especially, for the air bubbled
regions detection, the differences are more obvious. It is because of the LOG ilter which cannot distinguish between informative tissue and the air bubbled tissue. The second method has lower accuracy
and precision, but it is quicker than the irst. In the case of third method, the results are less than the
two irst methods. The third method contains only three steps and it is devised without using a neural
network or classi ier that means it is faster than the others.
As mentioned in the introduction, other works focused on time reduction, not improving frames for
the next processes. Moreover, these studies did not indicate air bubbled parts which are more similar
to normal tissue in a frame. The only paper was (Haji Maghsoudi and oltanian-Zadeh, 2013) which
proposed methods to distinguish among diseases region and extraneous matters; they reported relatively low accuracy and precision for detection of extraneous matters respectively between 60% and
80%.
However, to compare our methods with the proposed methods in the previous literature, we examine
the methods using a threshold (90%) to classify the informative and uninformative frames. Therefore,
the uninformative frames can be omitted after the processes. The experimental results are illustrated
30
Omid Haji Maghsoudi, Alireza Talebpour, Hamid Soltanian-Zadeh, Mahdi Alizadeh, and Hossein Asl Soleimani /
Journal of Advanced Computing (2014) Vol. 3 No. 1 pp. 12-34
in Figure 13. Regarding the methods accuracy for detection uninformative and informative regions,
the methods are able to omit the frames in a video. As expected, the irst method is more reliable for
the frame reduction that only one mistake ind in the uninformative frames.
The First problem in our work was the different ranges of bubble and juices used in the morphological
operation to increase bubble shape in the irst and the LOG ilter in the second method. Our Second
issue was the number of features after using Gabor wavelet ilter bank and extracting GLCM features
in the irst method, so the Fisher score test was used to reduce number of features. A MLP was used
as a classi ier, that another choice can be support vector machine (SVM). In addition, results were calculated for different uninformative types such as detecting total informative regions and air bubbled
regions which are shown in Figure 14 and the effect of HSV discriminators are illustrated in the irst
method which is illustrated in Figure14. The massive problem was to determine the different param-
Fig. 14. The blue (total uninformative parts) and green (only air bubbled and dark parts) bars show
the results in the irst method while the red (total uninformative parts) and yellow (only air bubbled
and dark parts) bars show the results in the second method.
eters. An algorithm (illustrated in Figure 11) was used to optimized and ind the best parameters.
Random sub-sampling cross validation helped us in some cases to ind the best value (such as neural
network hidden layers, sigmoid function, and HSV discriminator). Also, the character of function was
used in other cases (such as HSV discriminator, sigmoid function, and LOG).
These algorithms can work on different digestive organs. Future research will be directed to investigate automatic detection of diseases and tumor regions in modi ied frames which is regarding to the
results; we will be able to detect diseases accurately.
31
Omid Haji Maghsoudi, Alireza Talebpour, Hamid Soltanian-Zadeh, Mahdi Alizadeh, and Hossein Asl Soleimani /
Journal of Advanced Computing (2014) Vol. 3 No. 1 pp. 12-34
References
Akakin, H. C., Sarma, S. E., 2013. A Generalized Laplacian of Gaussian Filter for Blob Detection and Its Applications, IEEE transaction on Cybernetics, vol. 99, 1-15.
Altman, D. G., Bland, J. M., 1994. Diagnostic tests. 1: Sensitivity and speci icity, BMJ, 308, 1552.
Bashar, M. K., Kitasaka, T., Suenaga, Y., Mekada, Y., Mori, K., 2012. Automatic detection of informative frames
from wireless capsule endoscopy images, Elsevier Medical Image Analysis, 14, 449-470.
Bejakovic, S., Kumar, R., Dassopoulos, T., Mullin, G., Hager, G., 2009. Analysis of Crohn’s Disease Lesions in
Capsule Endoscopy Images, IEEE International Conference on Robotics and Automation, 2793-2798.
Bovik, A. C., Clark, M., Geisler, W. S., 1990. Multichannel Texture Analysis Using Localized Spatial Filters, IEEE
Transactions on Pattern Analysis and Machine Intelligence, 12, 55-73.
Chan, T., Vese, L., 2001. Active contours without edges, IEEE Trans. Image. Process, 10, 266-277 (2001)
Clausi, D. A., 2002. An analysis of co-occurrence texture statistics as a function of grey level quantization, Can.
J. Remote Sensing, 28, 45-62.
Clausi, D. A., Jernigan, M. E., 2000. Designing Gabor ilters for optimal texture separability, Pattern Recognition,
33, 1835-1849.
Coimbra, M., Silva Cunha, J. P., 2006. MPEG-7 visual descriptors - Contributions for automated feature extraction in capsule endoscopy, IEEE Trans. Circuits and Systems for Video Technology, 16, 628-637.
Cromartie, R., Pizer, S. M. 1991. Edge-Affected Context for Adaptive Contrast Enhancement, IPMI 1991, 374485.
December 2011. Given Imaging. Available: http://www.givenimageing.com.
Dring, C., Lesot, M. J., Kruse, R., 2006. Data analysis with fuzzy clustering methods, Computational Statistics
and Data Analysis, 51, 192-214.
Fritscher-Ravensand, F., Swain, P., 2002. The wireless capsule: New light in the darkness, Dig. Dis., 20, 127-133.
Gu, Q., Li, Z., Han, J., 2012. Generalized Fisher Score for Feature Selection, in proceeding of CoRR.
Haji Maghsoudi, O., Soltanian-Zadeh, H., 2013. Detection of Abnormalities in Wireless Capsule Endoscopy
Frames using Local Fuzzy Patterns, In Proceeding of 20th International conference on Biomedical Engineering (ICBME).
Haji-Maghsoudi, O., Talebpour, A., Soltanian-zadeh, H., Haji-Maghsoodi, N., 2012. Automatic Organs Detection
in WCE, 16th CSI International Symposium on Arti icial Intelligence & Signal Processing, (AISP), Shiraz,
Iran, 116�?21.
Haji-Maghsoudi, O., Talebpour, A., Soltanian-zadeh, H., Haji-Maghsoodi, N., 2012. Segmentation of Crohn’s,
Lymphangiectasia, Xanthoma, Lym- phoid hyperplasia and Stenosis diseases in WCE, in proceeding of
24th European Medical Informatics Conference (MIE), Pisa, Italy, 180, 143 �?47.
Haralick, R. M., Shanmugam, K., Dinstein, I., 1973. Textural Features for Image Classi ication, IEEE Trans. on
systems, man and cybernetics, 3, 610-621.
32
Omid Haji Maghsoudi, Alireza Talebpour, Hamid Soltanian-Zadeh, Mahdi Alizadeh, and Hossein Asl Soleimani /
Journal of Advanced Computing (2014) Vol. 3 No. 1 pp. 12-34
Harlick, R. M., Sternberg, R S., Zhuang, X, 1987. Image Analysis using mathematical Morphology, IEEE Tran. On
Pattern analysis and Machine Vision, 9, 532-550.
Iddan, G., Meron, G., Glukhovsky, A., Swain, P., 2000. Wireless capsule endoscopy, Nature, 405, 417-418.
Jain, K., Ratha, N., Lakshmanan, S., 1997. Object detection using Gabor ilters, Pattern Recognition, 30, 295-309.
Junzhou, C., Run, H., Li, Z., Qiang, P., Tao, G., 2011. Contourlet Based Feature Extraction and Classi ication for
Wireless Capsule Endoscopic Images, 4th International Conference on Biomedical Engineering and Informatics (BMEI), 1, 219-223.
Karargyris, A., Bourbakis, N., 2009. Identi ication of ulcers in Wireless Capsule Endoscopy videos, IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 554-557.
Karargyris, A., Bourbakis, N., 2011. Detection of Small Bowel Polyps and Ulcers in Wireless Capsule Endoscopy
Videos, IEEE Transactions on biomedical engineering, 58, 2777-2786.
Kumar, R., Zhao, Q., Seshamani, S., Mullin, G., Hager, G., Dassopoulos, T., 2012. Assessment of Crohn’s Disease
Lesions in Wireless Capsule Endoscopy Images, IEEE Transactions on biomedical engineering, 59,355362.
Leondes, C. T., 1998. Algorithms and Architectures Neural Network Systems Techniques and applications.
Li, B., Meng, M. Q. H., 2009. Computer-Aided Detection of Bleeding Regions for Capsule Endoscopy Images,
IEEE Transactions on biomedical engineering, 56, 1032-1039.
Li, B., Meng, M. Q. H., 2012. Tumor Recognition in Wireless Capsule Endoscopy Images using Textural Features
and SVM-based Feature Selection, IEEE Transaction on Information Technology in biomedicine, 16, 323329.
Li, C., Huang, R., Ding, Z., Gatenby, J. C., Metaxas, D. N., Gore, J. C., 2011. A Level Set Method for Image Segmentation in the Presence of Intensity Inhomogeneities with Application to MRI, IEEE Trans on Image
Processing, 20, 2007-2016.
Li, C., Kao, C. Y., Gore, J. C., Ding, Z., 2007. Implicit Active Contours Driven by Local Binary Fitting Energy, Conf.
Computer Vision and Pattern Recognition, IEEE, 12, 1-7.
Lim, J. S., 1990. Two-dimensional signal and image processing, Englewood Cliffs, NJ, Prentice Hall.
Mackiewicz, M., Berens, J., Fisher, M., 2008. Wireless Capsule Endoscopy Color Video Segmentation, IEEE transaction on medical imaging, 27, 1769-1781.
March 2012. http://murphylab.web.cmu.edu/publications/boland/.
Marr, D., Hildreth, E., 1980. Theory of edge detection, Proc. R. Soc. Lond. B, 187-217.
Naik, S. K., Murthy, C. A., 2003. Hue-preserving color image enhancement without gamut problem, IEEE Trans.
on Image Processing, 12, 1591-1598.
Oh, J. H., Hwang, S., Lee, J., Tavanapong, W., Wong, J., Goren, P. C., 2007. Informative frame classi ication for
endoscopy video, Medical Image Analysis, 11, 110-127.
Olsen, A. R., Gecan, J. S., Ziobro, G. C., Bryce, J. R., 2001. Regulatory Action Criteria for Filth and Other Extraneous Materials V. Strategy for Evaluating Hazardous and Nonhazardous Filth, Regulatory Toxicology and
Pharmacology, 33, 363-392.
33
Omid Haji Maghsoudi, Alireza Talebpour, Hamid Soltanian-Zadeh, Mahdi Alizadeh, and Hossein Asl Soleimani /
Journal of Advanced Computing (2014) Vol. 3 No. 1 pp. 12-34
Pan, G., Xu, F., Chen J., 2011. A Novel Algorithm for Color Similarity Measurement and the Application for
Bleeding Detection in WCE, Journal of Image Graphics and Signal Processing, 3, 1-7.
Pan, G., Yan, G., Qiu, X., Cui, J., 2010. Bleeding Detection in Wireless Capsule Endoscopy Based on Probabilistic
Neural Network, Journal of Medical Systems, 35, 1477-1484.
Ramesh, J., Rangachar, K., Brian, G., 1995. Machine vision, McGraw Hill, New York.
Samson, C., Blanc-Feraud, L., Aubert, G., Zerubia, J., 2000. A variational model for image classification and
restoration, IEEE Trans. Pattern Anal. Mach. Intell., 22, 460-472.
Soh, L., Tsatsoulis, C., 1999. Texture Analysis of SAR Sea Ice Imagery Using Gray Level Co-Occurrence Matrices,
IEEE Transactions on Geo- sciences and Remote Sensing, 37, 780-795.
Szczypinski, P., Klepaczko, A., Pazurek, M., Daniel, P., 2012. Texture and color based image segmentation
and pathology detection in capsule endoscopy videos, Computer Methods and Programs in Biomedicine,
113(1), 396-411.
Torre, V., Poggio, T. A., 1986. On edge detection, IEEE Trans, Pattern Anal, Mach, Intell, 8, 147-163.
Truc, P. T. H., Kimb, T. S., Lee, Y. K., 2011. A: Homogeneity- and Density Distance-driven Active Contours for
Medical Image Segmentation, Journal of Computers in Biology and Medicine, 41, 292-301.
Vilarino, F., Spyridonos, P., Pujol, O., Vitria, J., Radeva, P., 2006a. Automatic Detection of Intestinal Juices in
Wireless Capsule Video Endoscopy, The 18th International Conference on Pattern Recognition (ICPR �?6),
4, 719-722.
Vilarino, F., Kuncheva, L., 2006b. Roc curves and video analysis optimization in intestinal capsule endoscopy.
Pattern Recognition Letters, 27, 875-881.
Zhang, K., Zhang, L., Song, H., Zhou, W., 2010. Active contours with selective local or global segmentation: A
new formulation and level set method, Elsevier, Journal of Image and Vision Computing, 28, 668-676.
Zimmerman, J., Pizer, S., Staab, E., Perry, E., McCartney, W., Brenton, B., 1988. Evaluation of the effectiveness
of adaptive histogram equalization for contrast enhancement, IEEE Transactions on Medical Imaging, 7,
304-312.
34
Download