Uploaded by Yuksel CELIK, PhD

5437420 1-s2.0-S1746809419303155-main

advertisement
Biomedical Signal Processing and Control 56 (2020) 101734
Contents lists available at ScienceDirect
Biomedical Signal Processing and Control
journal homepage: www.elsevier.com/locate/bspc
Convolutional neural network approach for automatic tympanic
membrane detection and classification
Erdal Başaran a , Zafer Cömert b,∗ , Yüksel Çelik a
a
b
Karabük University, Department of Computer Engineering, Karabük, Turkey
Samsun University, Department of Software Engineering, Samsun, Turkey
a r t i c l e
i n f o
Article history:
Received 5 February 2019
Received in revised form 26 August 2019
Accepted 13 October 2019
Available online 30 October 2019
Keywords:
Biomedical signal processing
Clinical decision support system
Otitis media
Tympanic membrane detection
Convolutional neural network
Classification
a b s t r a c t
Otitis media (OM) is a term used to describe the inflammation of the middle ear. The clinical inspection
of the tympanic membrane is conducted visually by experts. Visual inspection leads to limited variability
among the observers and includes human-induced errors. In this study, we sought to solve these problems
using a novel diagnostic model based on a faster regional convolutional neural network (Faster R-CNN)
for tympanic membrane detection, and pre-trained CNNs for tympanic membrane classification. The
experimental study was conducted on a new eardrum dataset. The Faster R-CNN was initially applied to
the original images. The number of images in the dataset was subsequently increased using basic image
augmentation techniques such as flip and rotation. We also evaluated the success of the model in the
presence of various noise effects. The original and automatically extracted tympanic membrane patches
were finally input separately to the CNNs. The AlexNet, VGGNets, GoogLeNet, and ResNets models were
employed. This resulted in an average precision of 75.85% in the tympanic membrane detection. All CNNs
in the classification produced satisfactory results, with the proposed approach achieving an accuracy of
90.48% with the VGG-16 model. This approach can potentially be used in future otological clinical decision
support systems to increase the diagnostic accuracy of the physicians and reduce the overall rate of
misdiagnosis. Future studies will focus on increasing the number of samples in the eardrum dataset to
cover a full range of ontological conditions. This would enable us to realize a multi-class classification in
OM diagnosis.
© 2019 Elsevier Ltd. All rights reserved.
1. Introduction
Otitis media (OM), which is a type of middle ear infection, is one
of the most common pediatric diseases [1]. OM usually develops as
a complication of the upper respiratory tract starting from the nasal
cavity. It is a global health problem, which can either heal spontaneously or cause serious undesirable conditions such as speech
defects, hearing loss, and cognitive disorders [2]. Almost two-thirds
of all children under the age of seven experience this disease [3].
Additionally, it is also one of the leading causes of hearing loss in
childhood [4]. Various technological advances in medicine such
as otoscopy [5], pneumootoscopy, acoustic reflectometry, ultrasound evaluation, digital imaging, combined tympanometry/visual
evaluation, and acoustic tympanometry have begun to play an
∗ Corresponding author at: Canik Yerleşkesi Gürgenyatak Mahallesi Merkez Sokak
No: 40-2/1 55080, Canik/SAMSUN, Turkey.
E-mail addresses: ebasaran@beu.edu.tr (E. Başaran), zcomert@samsun.edu.tr
(Z. Cömert), yukselcelik@karabuk.edu.t (Y. Çelik).
https://doi.org/10.1016/j.bspc.2019.101734
1746-8094/© 2019 Elsevier Ltd. All rights reserved.
increasingly important role in the detection of otitis illnesses. In
clinical practice, otoscopy devices are frequently used to diagnose
OM by examining the status of the eardrum. Otoscope devices in the
current standard of care consist of a camera, halogen light source,
low-power magnifying lens, and port that connects to a computer
for the storage of images and videos [1,6].
OM is the most common clinical manifestation of perforation
in the eardrum, with inflammation and fluid accumulation in the
middle ear [7]. There are different types of OM corresponding to
various deformations in the eardrum. OM is generally categorized
into three different classes. The first is acute otitis media (AOM),
which is usually caused by the bacteria in the middle ear cavity and mainly develops after a cold or flu [8,9]. AOM is also the
most common type of OM. 90% of children below two years tend
to undergo acute otitis [10]. The basic symptoms for the diagnosis
of AOM include the presence of liquid in the middle ear, bulging of
the tympanic membrane or a reduction in its movements, redness
on or liquid behind the membrane, and the absence of the tympanic membrane [11]. AOM can also be classified as mild, moderate,
or severe based on the tympanic membrane findings and clinical
2
E. Başaran, Z. Cömert and Y. Çelik / Biomedical Signal Processing and Control 56 (2020) 101734
symptoms [12], and is one of the common reasons for the prescription of antibiotics [13]. The second common type of OM is the
effusion otitis media (EOM), which exhibits fluid accumulation in
the middle ear minus the symptoms of the AOM [14]. The otoscopic
examination of the membrane has revealed the basic symptoms of
EOM to include the accumulation of the mucoid, presence of an
air-fluid level or loss of gloss, and bubbles behind the membrane
[15,16]. The early diagnosis of EOM is important as its symptoms
are insidious [2]. The last common type of OM is the chronic suppurative otitis media (CSOM) [17], which is common among children
and leads to a decrease in the quality of life. CSOM causes longterm damage via infection and inflammation of the middle ear. The
symptoms of CSOM include the perforation of the eardrum or its
adherence to the middle ear wall, continuous or chronic discharge
from the middle ear, and erosion of the middle ear ossicles.
Vaccinations and antibiotics are routinely used clinically for
treatment and prevention. Several guidelines have also been published for the identification and description of detailed tympanic
membrane findings, which suggest appropriate diagnosis and initial treatment plans for patients [9,12,18].
The tympanic membrane and ear canal are currently examined
visually in the clinic. Visual inspection leads to limited variability
among the observers during diagnosis, includes human-induced
errors, and is not objective [13]. Moreover, there is limited usage
of computer-aided diagnosis or expert systems in this field [19]. In
this study, we adopted convolutional neural networks (CNNs) for
automatic tympanic membrane detection and classification tasks to
overcome the above-mentioned disadvantages and enable increasingly objective examinations. A set of image processing techniques
is used in the computational approach to focus on the region of
interest (ROI). Troublesome feature extraction processes are carried
out to describe the eardrum images before the classification task
[20]. Complete and coherent diagnosis tools are useful for addressing the above-mentioned disadvantages. The object detection and
classification tasks can be incorporated with high accuracy using
CNNs [21]. We have thus devised a faster regional convolutional
neural network approach (Faster R-CNN) for automatic tympanic
membrane detection, as an assessment of its physical status is
critical for evaluating the degree of disease, selecting a treatment
method, and for automated OM diagnosis. We evaluated the performance of the Faster R-CNN method for the automatic detection of
the tympanic membrane. Furthermore, the original and automatically detected tympanic membrane patches were separately input
to the pre-trained deep CNN models. We employed the AlexNet
[22], VGGNets [23], GoogLeNet [24], and ResNets [25] models. The
classification task was treated as a binary classification due to the
limited number of samples, even though there are three common
types of OM as discussed previously. This enabled us to provide a
consistent diagnosis model for the detection and classification of
the tympanic membrane.
2. Related studies
Various computational approaches have been proposed in the
literature to evaluate eardrum images. A multi-class classification
task covering AOM, EOM, and no-effusion has been introduced
based on a vocabulary and grammar approach, wherein otoscopists
and engineers have described an extensive feature set. Otitis media
vocabulary matches the visual cues of the disorders, while the otitis media grammar corresponds to the use of the vocabulary set in
the decision process. The researchers achieved a classification accuracy of 89.9% in the decision process [26]. A preliminary study has
been presented based on image processing techniques and color
distribution for Otorhinolaryngology. Researchers have previously
combined the probability density function of the color compo-
nent, Bayesian decision rule, and two regression models. It was
proposed that color by itself could not present adequate discriminative features for the identification of otitis [27]. An interactive
decision support tool known as the Cyclops Auris Wizard has been
introduced in daily clinical routine for the quantitative analysis of
medium ear pathology extensions. Software incorporating digital
image processing and geometry techniques for otology pathologies has been designed, while taking the visual and subjective
assessments of a specialist into consideration. This software can
measure the perforation proportion of the eardrum [28]. A DepthFirst Search algorithm has been developed to identify OM at home
and shorten the diagnosis timeline by transferring the real-time
OM images via smartphones [29]. A complete hybrid feature-based
system composed of the segmentation, feature extraction, feature
selection, and classification steps has been developed to categorize
different types of OM. Eardrum images in the above study were
segmented using active contours, while the histogram of oriented
gradient (HOG) and local binary pattern (LPB) descriptors were
used to extract the features. Finally, the AdaBoost algorithm was
employed for feature selection and classification, which resulted in
a 88.06% classification accuracy [30]. The OM classification task has
been realized using global image features and six machine learning algorithms namely, the k-nearest neighbor (kNN), decision tree
(DT), linear discriminant analysis (LDA), Naive Bayes, multi-layer
neural networks (MLN), and support vector machine (SVM) methods. The experimental results of the study indicated that the SVM
produced the best mean classification of 72.04% [31]. A portable
video otoscopy platform has also been presented to enhance the
quality of the eardrum images. The proposed model uses digital
image processing techniques, along with a continuity-based segmentation and Laplacian kernel [32]. Another model employing
image processing techniques and the decision tree classification
algorithm has been introduced for the automated diagnosis of
OM. This model could diagnose the AOM, EOM, earwax or foreign
body obstruction, and identify the normal tympanic membrane; it
achieved an 80.6% classification accuracy. A model incorporating
a low-cost custom-made video-otoscope has previously achieved
a classification accuracy of 78.7% [33]. A smartphone and cloudbased system for the automated diagnosis of otitis media has also
been proposed. Both image processing techniques and neural networks have been employed for diagnosis. The proposed system can
detect five different types of OM with an accuracy of 86.84% using
a neural network [34].
The rest of this paper is organized as follows: Section 3 describes
the materials and methods used in this study. The results and
discussion have been presented in Sections 4 and 5, respectively.
Lastly, Section 6 details the concluding remarks.
3. Material and methods
Tympanic membrane images were collected from patients who
volunteered for the study. The location of the tympanic membrane
was determined by three experienced otolaryngologists. A set of
preprocessing procedures, described in Section 3.1.2, were applied
to the images before training the models. The pre-trained deep
CNN models were used for the classification task. The original and
automatically determined tympanic membrane patches were separately input to the pre-trained deep CNN models. The flowchart of
the proposed model is illustrated in Fig. 1.
3.1. The eardrum data set
3.1.1. Image acquisition
In this study, we generated a new eardrum data set by collecting
images from eligible patients examined at the Özel Van Akdamar
E. Başaran, Z. Cömert and Y. Çelik / Biomedical Signal Processing and Control 56 (2020) 101734
3
Fig. 1. The flowchart of the proposed model for automatic tympanic membrane detection and classification.
Hospital in Turkey between 10/2018 and 1/2019. An experienced
otolaryngologist initially examined the patients using a standard
otoscopy device. The eardrum images from this device were saved
on a personal computer via a USB connection. The otolaryngologist’s diagnosis was subsequently saved by locating the images in
the predefined folders. The folder names were used as labels for
storing the images. The location of the tympanic membrane was
labeled using the Image Labeler App in MATLAB (R2018b). This procedure enabled us to obtain ground truth-values. In the final stage,
two other otolaryngologists validated the location of the tympanic
membrane in each otoscope image and determined its class. On
other words, each image in the data set was assigned an output label
based on the majority vote of three experts. The details of this voting
are presented as supplementary material. In addition, high quality
images suitable for image processing were then selected. This led
us to obtain 282 eardrum images of various types of OM from over
950 samples. Sequential images from the same patients and corrupted images resulting from hand shaking, insufficient light, and
low quality were identified in the elimination process. The images
in which the tympanic membrane could not be clearly identified
due to various miscellaneous reasons were also isolated from the
data set.
3.1.2. Description of the eardrum data set
A set of preprocessing steps were applied to the images before
training the models. The original images had a resolution of
768 × 576 pixels. First, image contrasts were enhanced using the
histogram equalization technique. The images were then resized
to 64 × 64 to ensure a suitable training duration for the Faster RCNN model. The image sizes during the classification process were
set to 227 × 227 pixels in AlexNet and 224 × 224 pixels in the other
deep CNN models.
The patients included 178 males and 104 females, aged between
2 and 71 years. The demographic information of the eardrum data
set is given in Table 1.
We employed image augmentation techniques such as flip and
rotation without disturbing the morphological structure of the digital otoscope images to increase the success of the model and
provide adequate data to the deep networks. This resulted in the
number of samples in the data set being increased from 282 to 1692.
Details of the sample distribution in the data set are given in Table 2.
4
E. Başaran, Z. Cömert and Y. Çelik / Biomedical Signal Processing and Control 56 (2020) 101734
Table 1
Demographic information on the eardrum data set.
Patients
Number
Total
Age (year)
Range
Mean
Median
Gender
Male
Female
282
blj , which is added to prevent overfitting. The result from the previous step is then passed through an activation function f (.) to
generate the output of the feature map. The above-detailed process
can be explained as shown in Eq. (1).
⎛
2–71
8
5
Xlj = f ⎝
⎞
Xil−1 ∗ kijl + blj ⎠
(1)
i ∈ Mj
178
104
Table 2 shows that there are also six abnormal classes in addition
to the normal ones. However, binary classification was used in the
experiments, as there were insufficient samples per class. This led
us to collect all abnormal types of OM in the abnormal class.
We have made this data set publicly available for free download
on http://www.ctganalysis.com/Category/otitis-media.
where, Mj denotes the input map selection. Recently, rectified linear activation functions (ReLU) are increasingly being used due to
their success over logistic sigmoid and hyperbolic tangent functions
[36].
Pool layers (POOL): Pooling layers realize a downsampling operation using different techniques such as average or maximum
pooling. This reduces the number of computational nodes and prevents overfitting [37]. This process can be expressed as shown in
Eq. (2).
3.2. Image augmentation techniques
Xlj = down Xjl−1
We also used an augmented data set in addition to the original data set, which was derived from the original samples using
image augmentation techniques. Image augmentation is a useful
technique to increase the success of the model when there are
limited samples in a data set. Popular basic image augmentation
techniques include Flip, Rotate, Scale, Crop, Transition, and Noise.
Furthermore, generative adversarial networks (GANs), interpolation, and machine learning algorithms can be used for the above
purpose.
We only utilized the Flip and Rotate techniques in the experiments to avoid disturbing the morphological structure of the
samples in the data set. A Flip procedure involves a simple static
transformation to generate a new image by a mirror-reversal of
the original along the horizontal or vertical axis. A new image is
generated in the Rotate method by changing the angle (deg) in a
counterclockwise direction around its center. Two additional samples were generated from each original image by applying the
flip operation along the horizontal and vertical axes. Furthermore,
three additional samples were generated by employing the rotate
method with angles of 90, 180, and 270 degrees. The number of
samples in the original data set was thus increased from 282 to
1692. Fig. 2 shows the result of the image augmentation process
for a sample in the eardrum data set.
3.3. Principles of convolutional neural networks
CNN models comprise several deep layers, each of which has a
different task in the architecture. Convolution, pooling, and fully
connected layers are some of the commonly used layers [35].
Convolutional layers (CONV): The convolution layers identify
distinctive local features in the input images. The feature maps of
the previous layers can be denoted by Xil−1 . These layers are convolved with the learnable kernels kijl and a trainable bias parameter
(2)
where, the down(.) function represents the downsampling operation. This approach provides a summary of the local distinctive
features.
Fully connected layers (FC): The data are passed through several
convolutions and subsampling layers to realize a full connection
between the neurons and all activation in the previous layers, via
fully connected layers. The aim of an FC layer is to use the discriminative features to classify the input image into various classes based
on the training data set [38].
Optimization: The training process is executed using optimizers such as RMSprop, stochastic gradient descent with momentum
(SGDM), and adaptive moment estimation (ADAM). The weights in
the SGDM method are regularly updated for each training set to
reach the goal at the earliest [39].
Vt = ˇVt−1 + a∇ w L (W, X, y)
(3)
where, L symbolizes the loss function, ␣ is the learning rate, and W
are the weights to be updated according to Eq. (4).
W = W − aVt .
(4)
The RMSProp optimizer adapts to the average of the slope
weights and keeps the learning rates per parameter. This approach
operates well in both online and non-stationary situations and fulfills the parameter update using momentum on the scaled slope
[40].
The ADAM optimizer updates the learning rate in each iteration
and adopts the parameter learning rates based on the average first
moment in the RMSProp method. It also uses the average of the
second moments of the slopes. This method has been designed to
have the advantages of the RMSProp method [41].
In this study, we used well-known pre-trained deep models
for the classification task namely, the AlexNet [22], VGGNets [23],
GoogLeNet [42], and ResNets [25] models.
Table 2
The distribution of the samples in the eardrum data set.
Description
Class
# of samples
# of augmented samples
The healthy tympanic membranes
Normal
Abnormal
AOM
Earwax
Miringoskleroz
Tympanostomy tubes
CSOM
Otitis externa
154
128
69
21
4
2
14
18
924
768
414
126
24
12
84
108
Abnormal or suspicious
tympanic membranes
E. Başaran, Z. Cömert and Y. Çelik / Biomedical Signal Processing and Control 56 (2020) 101734
5
Fig. 2. An illustration of the image augmentation process for a sample in the eardrum data set. (a) Original image. (b) Horizontally flipped. (c) Vertically flipped. (d) 90 degrees
rotated. (e) 180 degrees rotated. (f) 270 degrees rotated.
Fig. 3. A block diagram of Region Proposal Network (RPN) [46].
• AlexNet is a basic, pioneer deep model that was introduced in
the ImageNet Large Scale Visual Recognition Challenge (ILSVRC2012) [22]. This network has a depth of eight, with 61 million
computational parameters. Its architecture contains five CONV
layers followed by three FC layers, with a few max-POOL layers
in the center. A ReLU is employed in each CONV and FC layer to
enable faster training. The input layer of the AlexNet model takes
images of size 227 × 227 pixels.
• The VGG network is another important deep CNN model. This
model improves the general performance by raising the network
depth to 16 and 19 [23]. This architecture includes 16 or more
CONV/FC layers. Each CONV layer uses small 3 × 3 convolution
filters, with a POOL layer inserted between each group of two or
three CONV layers. The input layers of the VGG-16 and VGG-19
networks accept images of size 224 × 224 pixels.
• GoogLeNet has introduced a new technical term known as
the inception module, which offers a shortcut and few deeper
branches. This increases the depth of the network, while maintaining a constant computational complexity. The GoogLeNet
network consists of 22 layers and has solely 7 million computation parameters [42]. This indicates that the GoogLeNet network
uses 12× fewer parameters as compared to AlexNet and exhibits
a higher performance.
• The ResNet network offers a residual learning framework. In this
architecture, residual blocks are employed to provide a more suitable training time [25]. This model focuses on the degradation
problem, and is novel due to the residual blocks and depth in
its architecture. The stacked layers in a conventional deep learning model fit a desired underlying mapping, whereas the ResNet
model permits these layers to fit a residual mapping.
In this study, we adopted pre-trained deep CNN models to classify the tympanic membrane images into the normal and abnormal
categories. Further details on these models can be found in the
related papers.
3.4. Basic principles of faster R-CNN
Object detection and classification are dominant tasks in the
image processing field [43]. As the most primitive approach,
sliding-window detectors are utilized by sliding windows from left
and right, and from up to down to determine objects base on the
classifiers. In this process, various window sizes and aspect ratios
are considered. The patches obtained based on the windows are
used to feed classifier and are wrapped since many classifiers allow
input with the fixed size [44].
Instead of a brute force approach, a region proposal method to
create ROIs is suggested for object detection. In selective search
[44], first, each individual pixel is placed its own small group and
then the groups are merged considering the texture for ensuring the
possible ROIs. R-CNN are built based on a region proposal method
utilizing 2000 ROIs with the fixed size. The regions are applied as
the input to a CNN. In this manner, the distinctive features of the
regions are obtained, and classification is realized using the deep
network. In the architecture, fully connected layers are employed
to classify the objects as well as to refine the boundary box. R-CNN
have several disadvantages: it needs too many proposals to produce
an accurate result. Furthermore, in addition to existing of too many
overlapped regions with each other, the feature extraction process
is carried out individually for 2000 different ROIs. This is a timeconsuming procedure [44].
Fast R-CNN comes with a feature extractor to prevent 2000 times
repeated feature extraction process in R-CNN. It also covers an
external region proposal method that has similar properties with
the selective search. In order to generate ROIs proposal, the feature
map ensured by the feature extractor and the external regional
6
E. Başaran, Z. Cömert and Y. Çelik / Biomedical Signal Processing and Control 56 (2020) 101734
Table 3
The analysis of Faster R-CNN.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Name
Type
Activations
Learnables
Total Learnables
Image input
conv 1
relu 1
conv 2
relu 2
rpnConv3 × 3
rpnRelu
rpnConv1 × 1BoxDeltas
rpnBoxDeltas
rpnConv1 × 1ClsScores
rpnSoftmax
rpnClassification
regionProposal
roiPooling
fc 1
relu 3
fc 2
softmax
Classoutput
fcBoxDeltas
boxDeltas
Image Input
Convolution
ReLU
Convolution
ReLU
Convolution
ReLU
Convolution
Box Reg. Output
Convolution
RPN Softmax
RPN Cls. Output
Region Proposal
ROI Max Pooling
Fully Connected
ReLU
Fully Connected
Softmax
Classification output
Fully Connected
Box Reg. Output
64 × 64 × 3
64 × 64 × 32
64 × 64 × 32
64 × 64 × 32
64 × 64 × 32
64 × 64 × 32
64 × 64 × 32
64 × 64 × 32
–
64 × 64 × 16
4096 × 8 × 2
–
1×4
31 × 31 × 32
1 × 1 × 64
1 × 1 × 64
1×1×2
1×1×2
–
1×1×4
–
–
Weights 3 × 3 × 3 × 32-Bias 1 × 1 × 32
–
Weights 3 × 3 × 32 × 32-Bias 1 × 1 × 32
–
Weights 3 × 3 × 32 × 32-Bias 1 × 1 × 32
–
Weights 1 × 1 × 32 × 32-Bias 1 × 1 × 32
–
Weights 1 × 1 × 32 × 16-Bias 1 × 1 × 16
–
–
–
–
Weights 64 × 30752-Bias 64 × 1
–
Weights 2 × 64-Bias 2 × 1
–
–
Weigts 4 × 64-Bias 4 × 1
–
0
896
0
9248
0
9248
0
1056
0
528
0
0
0
0
1968182
0
130
0
0
260
0
Table 4
The training options of Faster R-CNN.
Max Epoch
Mini-Batch Size
Initial Learn Rate
Optimizer
the fine-tuning was realized in the last two steps by updating the
weights slightly.
Options 1
Options 2
Options 3
Options 4
[10 −100]
1
1 × 10−4
SGDM
[10 −100]
1
1 × 10−4
SGDM
[10 −100]
1
1 × 10−5
SGDM
[10 −100]
1
1 × 10−6
SGDM
proposal method are consolidated using an ROI pooling layer. In
this way, fully connected layers are fed with the patches without
repeated expensive feature extraction process. Consequently, the
training time of R-CNN is reduced significantly [45].
The architecture of Faster R-CNN is similar to Fast R-CNN. The
region proposal is only changed with a convolutional network
called region proposal network (RPN).
RPN gets the feature map produced by the first CNN in the design
as the input. RPN performs k guesses for each location in the feature
map. As a result, it produces 4 × k coordinates and 2 × k scores per
location as shown in Fig. 3. Faster R-CNN operates several anchors
that are attentively pre-selected according to the real-life objects
at different scales and have reasonable aspect ratios as well.
In this study, we propose a Faster R-CNN for automatic tympanic
membrane detection task. The details of the network architecture
covering description of layers, activations, and learnable weights
are given in Table 3.
The training of Faster R-CNN is performed in four key steps:
1 Step 1 of 4: Training a Region Proposal Network (RPN)
2 Step 2 of 4: Training a Fast R-CNN Network using the RPN from
Step 1
3 Step 3 of 4: Re-training RPN using weight sharing with Fast RCNN.
4 Step 4 of 4: Re-training Fast R-CNN using updated RPN.
As mentioned above, the training of Faster R-CNN is realized
in four key steps. For this reason, we have four different options
for each network as presented in Table 4. In the experimental
study, we also investigate the effects of the maximum epoch and
learning rate on the automatic membrane detection task. For this
particular purpose, the maximum epoch is tested between 10 to
100. In addition, the learning rate is investigated considering from
1 × 10−3 to 1 × 10−6 . The learning rates in the first two steps of the
Faster R-CNN were adjusted higher than the last two steps. Because
3.5. Performance metrics
In the model evaluation phase, the primary objective metrics
for object detection task are average-precision and recall since
these performance metrics have been adopted commonly in object
detection task [47,48]. The efficiency of the Faster R-CNN improved
on the basis of the precision and recall rates analyzing different
schemes mentioned above. As for classification, we derived several
common metrics from a confusion matrix consisting of true positive (TP), false positive (FP), false negative (FN) and true negative
(TN) indices [49]. The formulations of the metrics are given below:
Accuracy (Acc) =
TP + TN
TP + FP + FN + TN
(5)
TP
TP + FN
(6)
Sensitivity (Se) − Recall =
Specificity (Sp) =
F − score =
TN
TN + FP
(7)
2 ∗ TP
2 ∗ TP + FP + FN
Average − Precision (AP) =
TP
TP + TN
(8)
(9)
In object detection task, TP points a positive overlap between
real and detected tympanic membrane area. FN corresponds to a
missing area that model could not detect, in fact, this area matches
either a part of the tympanic membrane or entire of it. TN matches
the undetected area, and this area does not include any part of the
tympanic membrane. The metrics given above are utilized on all
test set images for producing the precision and recall (PR) curve.
Ideally, it is desired to close to one on Recall axis in PR curve.
In the classification task, TP and TN correspond to the numbers
of correctly identified abnormal and normal tympanic membrane samples whereas FP and FN correspond to the numbers of
incorrectly identified abnormal and normal tympanic membrane
samples, respectively.
In addition to the mentioned performance metrics, we employ
Receive Characteristic Curve (ROC) since it is a useful technique
to measure the model success in the binary classification task. In
this scope, the area under this curve (AUC) is calculated. ROC is a
E. Başaran, Z. Cömert and Y. Çelik / Biomedical Signal Processing and Control 56 (2020) 101734
Table 5
Division of the samples for training and testing.
The original data set
The augmented data set
Train
Test
Train
Test
196
86
1185
507
probability curve and AUC represents the degree or measure of separability. It explains how much model is capable of distinguishing
between classes. The ROC curve is drawn with TP rates against the
FP rates where TP rates are located on the y-axis and FP rates are
located on the x-axis [50].
4. Results
All training and testing processes were run on a workstation
equipped with the NVIDIA Quadro P6000 GPU and Intel(R) Xeon(R)
Gold 6132 CPU @2.60 GHz CPU using the MATLAB (R2018b) software.
The data in the training and testing processes of the Faster RCNN approach was divided into two parts with rates of 70% and
30%. This implies that 196 and 86 eardrum images were used to
train and validate the model for the original data set, while 1185
and 507 eardrum images were used to train and validate the model
for the augmented data set as described in Table 5.
As previously mentioned, four different experiments were run
to evaluate the success of the Faster R-CNN model in the tympanic
membrane detection task. First, the model was applied to the original data set and had 282 samples. The most efficient learning rates
were determined by trial and error as 1 × 10−4 , 1 × 10−4 , 1 × 10−5 ,
and 1 × 10−6 for the networks in the Faster R-CNN architecture.
The learning rate in the experiments should be neither too large
nor small to achieve ideal convergence within an appropriate training time. Additionally, the experiments considering the maximum
number of epochs was determined to be between ten and 100. The
best AP value of 0.6772 was obtained when the maximum number
of epochs was adjusted to 25 for the original eardrum data set. The
tympanic membrane was not detected in only eight of the 86 samples. The time required for the training and testing processes was
19.37 and 0.0753 min, respectively.
7
In the second experiment, the number of samples in the eardrum
data set was increased from 282 to 1692 using image augmentation
techniques. The experiment was repeated with the same hyperparameters. The best AP obtained was 0.7585 when the maximum
number of epochs was set to 100. The tympanic membrane was
not detected in only 34 of the 509 samples. The time required for
training and testing the Faster R-CNN was 529.56 and 0.5220 min,
respectively. In this experiment, we observed an increase in the
Faster R-CNN performance in the tympanic membrane detection
task as compared to the original data set. Increasing the maximum
number of epochs had a positive effect on the model performance.
The effects of noise on the success of the model were evaluated in the third and fourth experiments. The Gaussian and salt
and pepper noises were used in the third and fourth experiments,
respectively. Experiments were run on the augmented data set
similar to the second experiment. This resulted in an AP value of
0.7694 for the augmented data set containing Gaussian noise. The
most effective results were obtained when the maximum number of epochs was adjusted to equal 25. The time required for
training and testing the model was 135.089 and 0.4346 min, respectively. The tympanic membrane was not detected in only four of
the 509 samples. The model achieved the best AP value of 0.7952
for the augmented data set with salt and pepper noise. The tympanic membrane was not detected in only one sample in the test
set. The training and testing processes required 111.467 min and
0.4215 min, respectively. The model was able to detect the tympanic membrane with high precision in the noisy data set. However,
there was also an increase in the number of overlap regions, which
is not ideal for the proposed approach as it is desirable to extract
only one region identifying the tympanic membrane. Table 6 shows
the detailed results of the automatic tympanic membrane detection
task, with the most effective results highlighted in bold.
The results of the automatic tympanic membrane detection task
indicate that the model could successfully detect the tympanic
membrane and was resistant to noise. Increasing the number of
samples in the data set enhanced the model performance. However, there was also a substantial increase in the model training
time from 19.370 min to 529.56 min. This was due to the mini-batch
size value and number of samples in the data set. The size of the
mini-batch is constant in the Faster R-CNN architecture and must
Table 6
Performance results of the Faster R-CNN model in the tympanic membrane detection task.
Data set
Epoch
AP
# of missing samples
Train Time [min]
Test Time [min]
Original eardrum data set
10
15
20
25
50
100
0.4015
0.5007
0.6079
0.6772
0.6026
0.6325
31
25
17
8
16
9
7.793
11.380
14.533
19.370
36.173
70.142
0.0667
0.0701
0.0668
0.0753
0.0646
0.0815
Augmented eardrum data set
10
15
20
25
50
100
0.5827
0.6330
0.5627
0.6879
0.6998
0.7585
98
72
83
30
30
34
60.412
84.078
113.986
137.284
272.18
529.56
0.4657
0.5243
0.4478
0.4484
0.4552
0.5220
Augmented eardrum data set
with Gauss noise
10
15
20
25
50
100
0.7675
0.7095
0.7545
0.7694
0.7400
0.6235
8
12
1
4
0
2
87.101
85.651
110.171
135.089
268.187
555.808
0.4519
0.4324
0.4439
0.4346
0.4394
0.5141
Augmented eardrum data set
with salt & pepper noise
10
15
20
25
50
100
0.7162
0.6720
0.7952
0.7342
0.7408
0.6476
0
1
1
1
0
6
52.4371
75.2132
111.467
142.931
266.220
552.20
0.4661
0.4767
0.4215
0.4264
0.4198
0.4772
8
E. Başaran, Z. Cömert and Y. Çelik / Biomedical Signal Processing and Control 56 (2020) 101734
Fig. 4. Precision and recall curves considering the maximum number of epochs for the (a) original data set, (b) augmented data set, (c) augmented data set with Gaussian
noise, and (d) augmented data set with salt and pepper noise.
equal one. PR curves of the model incorporating the experimental
setups are illustrated in Fig. 4.
The experimental results show that the Faster R-CNN model
is a useful approach to detect the tympanic membrane from the
digital otoscope images. Some of the representative results of this
approach are illustrated in Fig. 5.
Well-known, pre-trained deep CNN models were employed in
the classification task. First, the augmented data set composed of
the original otoscope images was separately input to the deep
CNN models. The models were then fed with the automatically
detected tympanic membrane patches from Faster R-CNN. The
samples where the tympanic membrane was not detected were
used in their original forms.
The initial learning rate, learn drop factor, and learn drop period
were adjusted to 0.0001, 0.1 and 8, respectively, after numerous
experiments to achieve an efficient configuration for the deep models. Thus, the values of the above-mentioned parameters were
determined by trial and error. The SGDM optimizer was employed
for all the models. The maximum number of epochs and mini-batch
sizes were set to 32 and 16, respectively in all the models. This
resulted in 74 samples per epoch and 2368 maximum iterations
were required to train the models. Furthermore, the base learning
rate was updated four times during the training phase as the learn
drop factor was adjusted to 0.1 and the learn drop period was set
to eight. This resulted in the initial base learning having initial and
final values of 1 × 10−4 and 1 × 10−8 , respectively. Another point to
be considered is that a predictable sequence of numbers was used
when the data was divided into the training and test sets. The basic
aim of this process was to apply the same samples to the models.
The training and validation processes of the deep CNN models
fed with the original data set are illustrated in Fig. 6. It can be clearly
seen that the models required ∼1000 iterations to achieve convergence in 14 epochs, with no significant improvement beyond this
point. This indicates that the training time of the models can be
reduced significantly.
All models yielded promising results. The confusion matrices and performance metrics of the models fed with the original
samples are reported in Tables 7 and 8, respectively. AlexNet
achieved the best classification accuracy of 87.97%. VGG-16, VGG19, GoogLeNet, ResNet-50, and ResNet-101 achieved accuracies of
86.98%, 86.00%, 86.19%, 84.62%, and 85.01%, respectively. The sensitivity measure in this classification task is rather significant as it
shows the performance of the models in distinguishing the abnormal tympanic membrane samples. The ResNet-101 model was at
the forefront with a sensitivity of 82.17%. However, AlexNet was
superior to the other models in terms of the performance and
time required for training. The AlexNet model was trained in only
5.32 min.
The automatically detected membrane patches were input to
the models in the second stage of the classification task. The train-
E. Başaran, Z. Cömert and Y. Çelik / Biomedical Signal Processing and Control 56 (2020) 101734
9
Fig. 5. A subset of the automatically detected tympanic membranes from the digital otoscope images.
Table 7
Confusion matrices of the CNN models fed with the original otoscope images.
AlexNet
187a
43c
(a)
VGG-16
18b
259d
186a
44c
VGG-19
22b
255d
186a
44c
GoogLeNet
27b
250d
186a
44c
ResNet-50
26b
251d
177a
53c
ResNet-101
25b
252d
189a
41c
35b
242d
TP, True positive, (b) FP, False Positive, (c) FN, False Negative, (d) TN, True Negative.
Table 8
Performance results of the CNN models fed with the original otoscope images.
Model
Acc (%)
Se (%)
Sp (%)
F-score (%)
AUC
Train Time [min]
AlexNet
VGG-16
VGG-19
GoogLeNet
ResNet-50
ResNet-101
87.97
86.98
86.00
86.19
84.62
85.01
81.30
80.87
80.87
80.87
76.96
82.17
93.50
92.06
90.25
90.61
90.98
87.37
85.98
84.93
83.97
84.16
81.94
83.26
0.9392
0.9478
0.9346
0.9393
0.9210
0.9323
5.32
17.36
19.11
10.12
20.84
26.32
Table 9
Confusion matrices of the CNN models fed with the detected tympanic membrane patches.
AlexNet
182a
48c
(a)
VGG-16
14b
263d
199a
31c
VGG-19
27b
250d
187a
43c
GoogLeNet
28b
249d
TP, True positive, (b) FP, False Positive, (c) FN, False Negative, and (d) TN, True Negative.
190a
40c
ResNet-50
35b
242d
179a
51c
ResNet-101
33b
244d
179a
51c
37b
240d
10
E. Başaran, Z. Cömert and Y. Çelik / Biomedical Signal Processing and Control 56 (2020) 101734
Fig. 6. The training and validation accuracy and loss of the pre-trained deep CNN models fed with the original data set. (a) Training accuracy, (b) Training loss, (c) Validation
accuracy, and (d) Validation loss of the models.
Table 10
Performance results of the CNN models fed with the detected tympanic membrane patches.
Model
Acc (%)
Se (%)
Sp (%)
F-score (%)
AUC
Train Time [min]
AlexNet
VGG-16
VGG-19
GoogLeNet
ResNet-50
ResNet-101
87.77
88.56
86.00
85.21
83.43
82.64
79.13
86.52
81.30
82.61
77.82
77.83
94.95
90.25
89.89
87.37
88.08
86.64
85.45
87.28
84.05
83.52
81.00
80.27
0.9507
0.9537
0.9392
0.9295
0.9057
0.9004
5.66
17.20
18.95
9.31
19.80
44.93
ing and validation processes of the models are illustrated in Fig. 7.
The confusion matrices and performance results of the models are
shown in Tables 9 and 10, respectively.
The experimental results show that all the deep CNN models
provided satisfactory results. The VGG-16 model had the best classification performance with an accuracy, sensitivity, and specificity
of 88.56%, 86.52%, and 90.25%, respectively. The AlexNet, VGG-19,
GoogLeNet, ResNet-50, and ResNet-101 models had an accuracy of
87.77%, 86.00%, 85.21%, 83.43%, and 82.64%, respectively. A time
duration of 17.20 min was required to train the VGG-16 model. The
experimental results indicate that feeding the deep CNN models
with the automatically detected tympanic membrane patches elevated the model performances. Only the performance results of
the ResNet model decreased. Fig. 8 shows the AUCs of the models for the two experimental setups of the classification tasks.
Fig. 8 indicates that the VGG-16 model achieved the best AUC
value of 0.9537, with the other deep CNN models having AUC values > 0.90.
Lastly, additional experiments were conducted to validate the
model success. In these experiments, we considered the original
and automatically detected patches separately by splitting the data
with a different dividing rate, to obtain 50% training and 50% test
sets and a 10-fold cross-validation technique.
Table 11 shows the results from the experiments wherein the
deep pre-trained models were fed with the original OM samples,
while considering the 50% training and 50% test data division
rates and 10-fold cross-validation technique. All models achieved
satisfactory results, with the VGG-16 model having a superior classification accuracy of 90.24%.
In the final experiment, the data set was equally divided into
the training and test sets, and the 10-fold cross-validation was also
examined. The VGG-16 model had the most efficient results with
E. Başaran, Z. Cömert and Y. Çelik / Biomedical Signal Processing and Control 56 (2020) 101734
11
Fig. 7. Training and validation accuracy, and loss of the pre-trained deep CNN models fed with the automatically detected tympanic membrane patches. (a) Training accuracy,
(b) Training loss, (c) Validation accuracy, and (d) Validation loss of the models.
Fig. 8. The ROC curves of the models. (a) The models fed with the original samples. (b) The models fed with the automatically detected tympanic membrane patches.
12
E. Başaran, Z. Cömert and Y. Çelik / Biomedical Signal Processing and Control 56 (2020) 101734
Table 11
Classification results of the models fed with the original images.
Models
AlexNet
VGG-16
VGG-19
GoogLeNet
ResNet-50
ResNet-101
The results of the 50% training and 50% test data division rates
The results of the 10-fold cross-validation
Acc (%)
82.62
83.33
82.27
82.86
81.79
83.33
Acc (%)
86.64
90.24
90.24
88.00
87.47
89.06
Se (%)
71.09
77.86
72.65
76.56
72.91
79.16
Sp (%)
92.20
87.87
90.26
88.09
89.17
86.79
Se (%)
79.03
86.24
85.80
85.54
85.28
87.89
Sp (%)
92.96
93.07
93.93
90.04
89.28
90.04
Table 12
Classification results of the models.
Models
AlexNet
VGG-16
VGG-19
GoogLeNet
ResNet-50
ResNet-101
The results of the 50% training and 50% test data division rates
The results of the 10-fold cross-validation
Acc (%)
85.22
85.93
85.81
82.97
80.96
80.37
Acc (%)
87.29
90.48
90.01
86.76
84.39
84.22
Se (%)
77.08
80.99
78.90
77.86
74.47
77.86
Sp (%)
91.99
90.04
91.55
87.22
86.36
82.46
Se (%)
80.85
86.84
86.97
83.98
81.64
80.46
Sp (%)
92.64
93.50
92.53
89.06
86.68
87.33
Table 13
A comparison between the state-of-the-art methods used for the computational diagnosis of otitis media.
Authors
Methods
# of samples
# of class
Acc (%)
2011, Mironica et al. [31]
2011, Vetan et al. [27]
2013, Kuruvilla et al. [26]
2014, Chuen-Kai Shie et al. [30]
2016, Hermanus et al. [33]
2017, Huang and Huang [29]
2018, Hermanus et al. [34]
2019, This paper
Global image features, kNN, DT, LDA, Naïve Bayes, MLN, SVM
The color data distribution, Bayesian decision rule
Vocabulary and grammar, DT
Active contour segmentation, LBP and HOG features, and AdaBoost
Visual features, DT
Image processing, visual features, a depth-First search algorithm
Visual features, DT, neural networks
Faster R-CNN, pretrained CNN models, VGG-16, 10-fold cross-validation
186
100
181
865
486
20
389
1692
2
3
3
4
5
3
5
2
73.11
59.90
89.9
88.06
80.61
70.00
81.58
90.48
a classification accuracy of 90.48%. The 10-fold cross-validation
method and models with the automatically detected tympanic
membrane input patches increased the model performance, with
the results shown in Table 12.
5. Discussion
All related studies were compared by taking the various data
sets, methods, and accuracy performance metrics shown in Table 13
into consideration. However, an exact comparison was not possible
due to the use of different data sets and methods.
Table 13 clearly shows that the researchers mostly classified
the OM diseases based on a combination of the visual features from
the eardrum images and using conventional machine learning
techniques. In this study, we developed an end-to-end deep
learning model that automatically focuses on tympanic membrane
and eliminates the feature extraction and selection processes.
The model directly makes a decision based on the input otoscope
images.
The tympanic membrane contains most of the visual cues of
the underlying disorder in the otoscope images. Furthermore, the
black background around the circular field of view of the endoscopic camera is the most visible feature in the images [27]. This
has led some researchers to employ segmentation techniques such
as the active contour to separate the tympanic membrane from
the input otoscope image [30]. Some studies have applied this process manually [27]. In our proposed approach, Faster R-CNN was
devised to enable selection of the tympanic membrane. Furthermore, pre-trained CNN models were adopted to classify the digital
otoscope images without the cumbersome feature extraction and
selection processes. We have, thus achieved a fully heuristic diagnostic model for otology in this study.
The successful implementation of the deep CNN models in clinical decision support systems is difficult as the models require
reasonably large-scale data sets to ensure robust diagnostic outcomes [19]. The collection of large-scale data from the clinics is
quite an expensive and difficult task, and at times may even be
impossible [21]. Decision support systems have not been actively
adopted by practitioners in the field of otology, as the importance
of diagnostic systems has not yet be fully comprehended. However,
the use of a consistent and heuristic diagnostic system in daily clinical applications offers several advantages such as increasing the
physician diagnostic accuracy, reducing the misdiagnosis rate, supporting the decision-making process, and conducting standard and
objective examinations.
6. Conclusion
OM is the technical term for the inflammation of the middle ear.
The eardrum is examined visually in clinical practice. This leads to
limited variability among the observers during diagnosis, includes
human-induced errors, and is subjective. It is evident that computerized methods for the diagnosis of clinical OM are not yet
sufficiently widespread.
We addressed the above-mentioned disadvantages as follows:
(1) First, a new publicly accessible eardrum data set comprising
282 tympanic membrane images was introduced.
(2) A Faster R-CNN model was proposed to select automatically the
tympanic membrane from the digital otoscope images, as this
membrane contains most of the visual cues of the underlying
disorder. This led us to achieve an AP of 75.85% in the tympanic
membrane detection task.
(3) Pre-trained deep CNN models were adopted to distinguish
between the abnormal and normal samples, without the
E. Başaran, Z. Cömert and Y. Çelik / Biomedical Signal Processing and Control 56 (2020) 101734
cumbersome feature extraction and selection processes. The
AlexNet, VGGNets, GoogLeNet, and ResNets models were used
for this specific task. All models yielded satisfactory results,
with the best accuracy of 90.48% achieved by the VGG-16 model.
(4) A consistent diagnosis model for tympanic membrane detection and classification was thus developed. This approach can
potentially be used in future otological clinical decision support systems to enhance the diagnostic accuracy and reduce
the overall rate of misdiagnosis.
Future studies would focus on increasing the number of samples in the eardrum data set to cover a full range of ontological
conditions. This would enable us to realize a multi-class classification task in OM diagnosis. Focus would also be on the activation
maps in the deep CNN models to describe the samples and feed the
shallow networks.
[9]
[10]
[11]
[12]
[13]
Ethical approval
This article does not contain any data, or other information from
studies or experimentation, with the involvement of human or animal subjects.
Data availability
The otoscope images used in this study can be freely downloaded from http://www.ctganalysis.com/Category/otitis-media.
The annotations of the experts can be found on the same web site.
Funding
There is no funding source for this article.
Appendix A. Supplementary data
Supplementary material related to this article can be found,
in the online version, at doi:https://doi.org/10.1016/j.bspc.2019.
101734.
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
Declaration of Competing Interest
The authors declare that there is no conflict to interest related
to this paper.
[23]
[24]
References
[1] H.C. Myburgh, S. Jose, D.W. Swanepoel, C. Laurent, Towards low cost
automated smartphone- and cloud-based otitis media diagnosis, Biomed.
Signal Process. Control 39 (2018) 34–52, http://dx.doi.org/10.1016/j.bspc.
2017.07.015.
[2] E.B. Edetanlen, B.D. Saheeb, Otitis media with effusion in Nigerian children
with cleft palate: incidence and risk factors, Br. J. Oral Maxillofac. Surg.
(2018), http://dx.doi.org/10.1016/j.bjoms.2018.11.015.
[3] A. Kørvel-Hanquist, A. Koch, J. Lous, S.F. Olsen, P. Homøe, Risk of childhood
otitis media with focus on potentially modifiable factors: a Danish follow-up
cohort study, Int. J. Pediatr. Otorhinolaryngol. 106 (2018) 1–9, http://dx.doi.
org/10.1016/j.ijporl.2017.12.027.
[4] R.C. Di Francesco, V.B. Barros, R. Ramos, Otitis media with effusion in children
younger than 1 year, Rev. Paul. Pediatr. English Ed. 34 (2016) 148–153, http://
dx.doi.org/10.1016/j.rppede.2016.01.003.
[5] S. Mousseau, A. Lapointe, J. Gravel, Diagnosing acute otitis media using a
smartphone otoscope; a randomized controlled trial, Am. J. Emerg. Med. 36
(2018) 1796–1801, http://dx.doi.org/10.1016/j.ajem.2018.01.093.
[6] R.H. Eikelboom, M.N. Mbao, H.L. Coates, M.D. Atlas, M.A. Gallop, Validation of
tele-otology to diagnose ear disease in children, Int. J. Pediatr.
Otorhinolaryngol. 69 (2005) 739–744, http://dx.doi.org/10.1016/j.ijporl.2004.
12.008.
[7] A. Coleman, A. Cervin, Probiotics in the treatment of otitis media. The past, the
present and the future, Int. J. Pediatr. Otorhinolaryngol. 116 (2019) 135–140,
http://dx.doi.org/10.1016/j.ijporl.2018.10.023.
[8] N.H. Davidoss, Y.K. Varsak, P.L. Santa Maria, Animal models of acute otitis
media – a review with practical implications for laboratory research, Eur.
[25]
[26]
[27]
[28]
[29]
[30]
[31]
13
Ann. Otorhinolaryngol. Head Neck Dis. 135 (2018) 183–190, http://dx.doi.org/
10.1016/j.anorl.2017.06.013.
J. Pitaro, S. Waissbluth, M.-C. Quintal, A. Abela, A. Lapointe, Characteristics of
children with refractory acute otitis media treated at the pediatric emergency
department, Int. J. Pediatr. Otorhinolaryngol. 116 (2019) 173–176, http://dx.
doi.org/10.1016/j.ijporl.2018.10.045.
E. Roy, K.Z. Hasan, F. Haque, A.K.M. Siddique, R.B. Sack, Acute otitis media
during the first two years of life in a rural community in Bangladesh: a
prospective cohort study, J. Heal. Popul. Nutr. 25 (2007) 414–421.
A. Büyükcam, A. Kara, T. Bedir, B. Gülhan, H. Özdemir, M. Sütçü, M. Düzgöl, A.
Arslan, T. Tekin, S. Çelebi, M.G. Kukul, G.İ. Bayhan, M. Köşker, A. Karbuz, M.
Çelik, Z.K. Sütçü, Ö. Metin, S. Karakaşlılar, A. Dağlı, S.S. Kara, E. Albayrak, S.
Kanık, H. Tezer, A. Parlakay, E. Çiftci, A. Somer, İ. Devrim, Z. Kurugöl, E.Ç.
Dinleyici, P. Atla, Pediatricians’ attitudes in management of acute otitis media
and ear pain in Turkey, Int. J. Pediatr. Otorhinolaryngol. 107 (2018) 14–20,
http://dx.doi.org/10.1016/j.ijporl.2018.01.011.
K. Kitamura, Y. Iino, Y. Kamide, F. Kudo, T. Nakayama, K. Suzuki, H. Taiji, H.
Takahashi, N. Yamanaka, Y. Uno, Clinical Practice Guidelines for the diagnosis
and management of acute otitis media (AOM) in children in Japan – 2013
update, Auris Nasus Larynx 42 (2015) 99–106, http://dx.doi.org/10.1016/j.anl.
2014.09.006.
M.E. Pichichero, Diagnostic accuracy of otitis media and tympanocentesis
skills assessment among pediatricians, Eur. J. Clin. Microbiol. Infect. Dis. 22
(2003) 519–524, http://dx.doi.org/10.1007/s10096-003-0981-8.
S. Shah-Becker, M.M. Carr, Current management and referral patterns of
pediatricians for acute otitis media, Int. J. Pediatr. Otorhinolaryngol. 113
(2018) 19–21, http://dx.doi.org/10.1016/j.ijporl.2018.06.036.
A. Aksoy, E. Ayhan, Orta kulak efüzyonlarında timpanogram ile otoskopik
bulguların karşılaştırılması, Dicle Med. J. 40 (2013) 54–56, http://dx.doi.org/
10.5798/diclemedj.0921.2013.01.0224.
I.P.O. Timoty Els, The Prevalence and Impact of Otitis Media With Effusion in
Children Admittedfor Adeno-tonsillectomy at Dr George Mukhari Academic
Hospital, Pretoria,South Africa, 2018, pp. 76–80.
N.S. Tsilis, P.V. Vlastarakos, V.F. Chalkiadakis, D.S. Kotzampasakis, T.P.
Nikolopoulos, Chronic otitis media in children: an evidence-based guide for
diagnosis and management, Clin. Pediatr. (Phila) 52 (2013) 795–802, http://
dx.doi.org/10.1177/0009922813482041.
A.S. Lieberthal, A.E. Carroll, T. Chonmaitree, T.G. Ganiats, A. Hoberman, M.A.
Jackson, M.D. Joffe, D.T. Miller, R.M. Rosenfeld, X.D. Sevilla, et al., The diagnosis
and management of acute otitis media, Pediatrics (2013), peds–2012.
L.S. Goggin, R.H. Eikelboom, M.D. Atlas, Clinical decision support systems and
computer-aided diagnosis in otology, Otolaryngol. Neck Surg. 136 (2007)
s21–s26, http://dx.doi.org/10.1016/j.otohns.2007.01.028.
A.H, J.K. Anupama Kuruvilla, Jian Li, Pablo Hennings Yeomans, Pedro Quelhas,
Nader Shaikh, Otitis media vocabulary and grammar, Media (2012)
2845–2848.
Z. Cömert, A.F. Kocamaz, Fetal hypoxia detection based on deep convolutional
neural network with transfer learning approach, in: R. Silhavy (Ed.), Softw.
Eng. Algorithms Intell. Syst., Springer International Publishing, Cham, 2019,
pp. 239–248, http://dx.doi.org/10.1007/978-3-319-91186-1 25.
A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep
convolutional neural networks, in: F. Pereira, C.J.C. Burges, L. Bottou, K.Q.
Weinberger (Eds.), Proc. 25th Int. Conf. Neural Inf. Process. Syst. - Vol. 1,
Curran Associates, Inc., USA, 2012, pp. 1097–1105 http://dl.acm.org/citation.
cfm?id=2999134.2999257.
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale
image recognition, ArXiv Prepr (2014), ArXiv1409.1556.
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A.
Karpathy, A. Khosla, M. Bernstein, A.C. Berg, L. Fei-Fei, ImageNet large scale
visual recognition challenge, Int. J. Comput. Vis. 115 (2015) 211–252, http://
dx.doi.org/10.1007/s11263-015-0816-y.
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition,
Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Las Vegas, NV, USA (2016)
770–778, http://dx.doi.org/10.1109/CVPR.2016.90.
A. Kuruvilla, N. Shaikh, A. Hoberman, J. Kovačević, Automated diagnosis of
otitis media: vocabulary and grammar, J. Biomed. Imaging 2013 (2013)
27.
C. Vertan, D.C. Gheorghe, B. Ionescu, Eardrum color content analysis in
video-otoscopy images for the diagnosis support of pediatric otitis, ISSCS
2011 - Int. Symp. Signals, Circuits Syst. Proc. (2011) 129–132, http://dx.doi.
org/10.1109/ISSCS.2011.5978676.
H. Junior, E. Comunello, S. Costa, C.C. Dornelles, Computational techniques for
accompaniment and measuring of otology pathologies, in: Twent. IEEE Int.
Symp. Comput. Med. Syst., IEEE, Maribor, Slovenia, 2007.
Y.K. Huang, C.P. Huang, A depth-first search algorithm based otoscope
application for real-time otitis media image interpretation, Parallel Distrib.
Comput. Appl. Technol. PDCAT Proc. 2017-Decem (2018) 170–175, http://dx.
doi.org/10.1109/PDCAT.2017.00036.
C.K. Shie, H.T. Chang, F.C. Fan, C.J. Chen, T.Y. Fang, P.C. Wang, A hybrid
feature-based segmentation and classification system for the computer aided
self-diagnosis of otitis media, 2014 36th Annu. Int. Conf. IEEE Eng. Med. Biol.
Soc. EMBC 2014 (2014) 4655–4658, http://dx.doi.org/10.1109/EMBC.2014.
6944662.
I. Mironica, C. Vertan, D.C. Gheorghe, Automatic pediatric otitis detection by
classification of global image features, 2011 E-Health Bioeng. Conf. (2011)
1–4.
14
E. Başaran, Z. Cömert and Y. Çelik / Biomedical Signal Processing and Control 56 (2020) 101734
[32] L. Cheng, J. Liu, C.E. Roehm, T.A. Valdez, Enhanced video images for tympanic
membrane characterization, Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc.
EMBS (2011) 4002–4005, http://dx.doi.org/10.1109/IEMBS.2011.6090994.
[33] H.C. Myburgh, W.H. van Zijl, D. Swanepoel, S. Hellström, C. Laurent, Otitis
media diagnosis for developing countries using tympanic membrane
image-analysis, EBioMedicine 5 (2016) 156–160, http://dx.doi.org/10.1016/J.
EBIOM.2016.02.017.
[34] H.C. Myburgh, S. Jose, D.W. Swanepoel, C. Laurent, Towards low cost
automated smartphone- and cloud-based otitis media diagnosis, Biomed.
Signal Process. Control 39 (2018) 34–52, http://dx.doi.org/10.1016/j.bspc.
2017.07.015.
[35] Y. Guo, Ü. Budak, L.J. Vespa, E. Khorasani, A. Şengür, A retinal vessel detection
approach using convolution neural network with reinforcement sample
learning strategy, Measurement 125 (2018) 586–591, http://dx.doi.org/10.
1016/j.measurement.2018.05.003.
[36] D. Macêdo, C. Zanchettin, A.L.I. Oliveira, T. Ludermir, Enhancing batch
normalized convolutional networks using displaced rectifier linear units: a
systematic comparative study, Expert Syst. Appl. 124 (2019) 271–281, http://
dx.doi.org/10.1016/j.eswa.2019.01.066.
[37] C. Xu, J. Yang, H. Lai, J. Gao, L. Shen, S. Yan, UP-CNN: un-pooling augmented
convolutional neural network, Pattern Recognit. Lett. 119 (2019) 34–40,
http://dx.doi.org/10.1016/j.patrec.2017.08.007.
[38] A. Lumini, L. Nanni, Deep learning and transfer learning features for plankton
classification, Ecol. Inform. 51 (2019) 33–43, http://dx.doi.org/10.1016/j.
ecoinf.2019.02.007.
[39] L. Wang, Y. Yang, R. Min, S. Chakradhar, Accelerating deep neural network
training with inconsistent stochastic gradient descent, Neural Netw. 93
(2017) 219–229, http://dx.doi.org/10.1016/j.neunet.2017.06.003.
[40] M.D. Zeiler, ADADELTA: An Adaptive Learning Rate Method, 2012, http://dx.
doi.org/10.1145/1830483.1830503.
[41] R. Shindjalova, K. Prodanova, V. Svechtarov, Modeling data for tilted implants
in grafted with bio-oss maxillary sinuses using logistic regression, AIP Conf.
Proc. (2014) 58–62, http://dx.doi.org/10.1063/1.4902458, 1631.
[42] C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: 2015 IEEE
Conf. Comput. Vis. Pattern Recognit., IEEE, 2015, pp. 1–9, http://dx.doi.org/10.
1109/CVPR.2015.7298594.
[43] R.B. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for
accurate object detection and semantic segmentation, 2014 IEEE Conf.
Comput. Vis. Pattern Recognit. (2014) 580–587.
[44] C.L. Zitnick, P. Dollár, Edge Boxes, Locating object proposals from edges, in: D.
Fleet, T. Pajdla, B. Schiele, T. Tuytelaars (Eds.), Comput. Vis. — ECCV 2014,
Springer International Publishing, Cham, 2014, pp. 391–405.
[45] R. Girshick, Fast R-CNN, Proc. IEEE Int. Conf. Comput. Vis. (2015) 1440–1448.
[46] S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: towards real-time object
detection with region proposal networks, IEEE Trans. Pattern Anal. Mach.
Intell. 39 (2017) 1137–1149, http://dx.doi.org/10.1109/TPAMI.2016.2577031.
[47] P. Ding, Y. Zhang, W.-J. Deng, P. Jia, A. Kuijper, A light and faster regional
convolutional neural network for object detection in optical remote sensing
images, ISPRS J. Photogramm, Remote Sens. 141 (2018) 208–218, http://dx.
doi.org/10.1016/j.isprsjprs.2018.05.005.
[48] Ü. Budak, Ö.F. Alçin, M. Aslan, A. Şengür, Optic disc detection in retinal images
via faster regional convolutional neural networks, 1st Int. Eng. Technol. Symp.
(2018).
[49] Z. Cömert, A.F. Kocamaz, V. Subha, Prognostic model based on image-based
time-frequency features and genetic algorithm for fetal hypoxia assessment,
Comput. Biol. Med. (2018), http://dx.doi.org/10.1016/j.compbiomed.2018.06.
003.
[50] T.C.W. Landgrebe, R.P.W. Duin, Efficient multiclass ROC approximation by
decomposition via confusion matrix perturbation analysis, IEEE Trans. Pattern
Anal. Mach. Intell. 30 (2008) 810–822, http://dx.doi.org/10.1109/TPAMI.2007.
70740.
Download