Automated analysis and diagnosis of skin melanoma on whole slide ,

advertisement
Pattern Recognition 48 (2015) 2738–2750
Contents lists available at ScienceDirect
Pattern Recognition
journal homepage: www.elsevier.com/locate/pr
Automated analysis and diagnosis of skin melanoma on whole slide
histopathological images
Cheng Lu a,n, Mrinal Mandal b
a
b
College of Computer Science, Shaanxi Normal University, Xi'an, Shaanxi Province 710119, China
Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta, Canada T6G 2V4
art ic l e i nf o
a b s t r a c t
Article history:
Received 12 February 2013
Received in revised form
19 August 2014
Accepted 25 February 2015
Available online 5 March 2015
Melanoma is the most aggressive type of skin cancer, and the pathological examination remains the gold
standard for the final diagnosis. Traditionally, the histopathology slides are examined under a
microscope by pathologists which typically leads to inter- and intra-observer variations. In addition, it
is time consuming and tedious to analyze a whole glass slide manually. In this paper, we propose an
efficient technique for automated analysis and diagnosis of the skin whole slide image. The proposed
technique consists of five modules: epidermis segmentation, keratinocytes segmentation, melanocytes
detection, feature construction and classification. Since the epidermis, keratinocytes and melanocytes
are important cues for the pathologists, these regions are first segmented. Based on the segmented
regions of interest, the spatial distribution and morphological features are constructed. These features,
representing a skin tissue, are classified by a multi-class support vector machine classifier. Experimental
results show that the proposed technique is able to provide a satisfactory performance (with about 90%
classification accuracy) and is able to assist the pathologist for the skin tissue analysis and diagnosis.
& 2015 Elsevier Ltd. All rights reserved.
Keywords:
Histopathological image analysis
Object detection
Image analysis
Skin cancer
Melanoma
1. Introduction
Cancer is a major cause of death all over the world. It is caused
by the uncontrolled growth of abnormal cells in the body. Skin
cancer is one of the most frequent types of cancer, and melanoma is
the most aggressive type of skin cancer [17]. According to a recent
article, approximately 70,000 people are diagnosed with melanoma
skin cancer, and about 9000 die from it in the United States alone
every year [1]. The malignant melanoma is curable, if it is diagnosed
at early stages [17]. Therefore, an early detection and accurate
prognosis of malignant melanoma will definitely help us to lower
the mortality from this cancer. Approaches to melanoma diagnosis
have dynamically evolved during the past 25 years [22]. Although
there are many new emerging techniques, e.g., dermatoscopy [29],
that could provide initial diagnosis, the pathological examination
remains the gold standard for the final diagnosis. In addition, useful
prognostic information in the clinical management of the patient
can also be provided by the histological examination.
The histopathology slides provide a cellular level view of the
diseased cell and tissue and is considered the “gold standard” in the
diagnosis of diseases for almost all kinds of cancer [12]. Traditionally,
n
Corresponding author. Tel./fax: þ 86 29 85310161.
E-mail addresses: chenglu@snnu.edu.cn (C. Lu),
mmandal@ualberta.ca (M. Mandal).
http://dx.doi.org/10.1016/j.patcog.2015.02.023
0031-3203/& 2015 Elsevier Ltd. All rights reserved.
the histopathology slides are examined under a microscope by
pathologists. With the help of the whole slide histology digital
scanners, glass slides of tissue specimen can now be digitized at high
magnification to create the whole slide image (WSI) [23,32]. Such high
resolution images are similar to what a pathologist observes under a
microscope to diagnose the biopsy. The pathologists can now do the
examination via the WSI instead of using the microscope. Note that
the WSI takes a large storage space and huge computing power for
processing. For example, a 20 mm2 glass slide tissue scanned with a
resolution of 0:11625 μm=pixel (at 40 magnification) will consist of
about 2.96 1010 pixels, and will approximately require 80 GB of
storage space in uncompressed color format (24 bits/pixel). Therefore,
it is time consuming and difficult to analyze a WSI manually. In
addition, the manual diagnosis is subjective and often leads to intraobserver and inter-observer variability [9].
In order to address the above-mentioned problems, automated
computational tools/technique which can provide reliable and
reproducible objective results for quantitative analysis is needed.
Recently, many researchers have applied sophisticated digital image
analysis techniques to extract objective and accurate prognostic
clues throughout the WSI. These computer-aided systems have
shown promising initial results in the case of breast cancer [21,8],
prostate cancer [4], head and neck cancer [18], cervical cancer [30],
neuroblastoma [25,10], and ovarian cancer [26].
In this work, we propose an automatic analysis and diagnosis
technique for the whole slide skin pathological image. The goal of
C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750
the proposed technique is to provide quantitative measures which
will help the pathologist for their diagnosis and classify the melanoma, melanotic nevus and normal skin biopsy automatically.
This paper is organized as follows. Section 2 presents the related
works on the automatic WSI analysis and skin tissue analysis.
Section 3 describes the image data used in this paper. Section 4
presents the proposed technique that consists of five modules.
Experimental results are presented in Section 4, followed by the
conclusion in Section 5.
2. Related works
2.1. WSI analysis techniques
The whole slide histopathological image analysis has become an
attractive research topic in recent years [25,10,4]. The WSI is able to
provide global information of a tissue specimen for quantitative
image analysis. However, the automated WSI analysis is challenging
since it has high computational complexity. Several techniques based
on the multi-resolution framework have been proposed. Table 1
shows the state-of-the-art literature on the WSI analysis techniques.
Petushi et al. [21] employed gray scale conversion, adaptive
thresholding and morphological operations to segment the nuclei
and identify the high nuclei density regions in the invasive breast
carcinoma WSIs. Classification of the tissue type is then performed
based on the features extracted from pre-segmented areas using
different classifiers. A classification accuracy of 68% has been reported
on a database of 24 WSIs.
Mete et al. [18] proposed a block-based supervised technique for
detection of malignant regions in histopathological head and neck
slides. With the pre-defined training subimages (128 128 pixels),
this technique first extracts the most prominent colors that are
present in the positive and negative training samples. These colors
are then clustered into several groups that are used to train the SVM.
For malignancy detection in a candidate image, the color information
is extracted and is classified by the pre-trained SVM. The technique
provides a good performance. However, the performance may suffer
from the staining color variations of the WSIs.
Wang et al. [30] developed an automated computer-aided technique for the diagnosis of cervical intraepithelial neoplasia. This
technique first segments the epithelium using prior knowledge of
the tissue distribution. Based on the measurements and features
obtained from the squamous epithelium, the SVM is employed to
perform the classification. This CAD technique has been reported to
achieve 94.25% accuracy for four classes tissue classification on 31
digital whole slides.
Sertel et al. [25] developed a multi-scale CAD technique for
classification of stromal development for neuroblastoma. This technique uses texture features extracted using co-occurrence statistics and
local binary patterns. A modified K-NN classifier was employed to
determine the confidence resolution for the classification. The experimental results showed an overall classification accuracy of 88.4%. Kong
et al. [10] developed a similar CAD technique for classification of the
grades of neuroblastic differentiation. This technique first segments a
WSI into multiple cytological components (e.g., nuclei, cytoplasm,
RBCs) at each resolution level using an Expectation-maximization
approach. The cytological and statistical features derived from the presegmented results are then fed to a multi-classifier combiner for the
training. The trained classification technique is tested on 33 WSI with
87.88% accuracy.
Roullier et al. [24] proposed a multi-resolution graph-based
analysis framework for the WSI analysis of breast cancer. The
2-means clustering is applied on the histogram of the regularized
image to cluster the region of interest and background. Spatial
refinement based on the discrete label regularization are used to
achieve accurate segmentation around the boundary. The above
steps are repeated at four different resolution images (from lower
to higher). Finally, the mitosis are identified in the labeled region
of interest with the domain knowledge that the mitosis are
visually recognized by the red-cyan color difference.
2.2. Skin tissue analysis techniques
Since different tissue types have different specific diagnostic
features considered by the pathologists, the techniques mentioned
in Section 2.1 could not be applied in the skin tissue analysis and
diagnosis.
Unlike other types of tissue specimen, a typical skin tissue slide
consists of three main parts: epidermis, dermis and sebaceous
tissues. The anatomy of a typical skin tissue is shown in Fig. 1,
where the lower image shows the manually labeled contour of the
epidermis.
The epidermis area is an important observation area in skin
melanoma diagnosis. The morphological features and distribution of
the interested objects are most useful in diagnosis. Two examples
of skin WSI are shown in Fig. 2. Fig. 2(a) shows an example of
melanocytic nevus, where melanocytes are invading into the dermis
area (the invading melanocytes appear as the dark blue area in
dermis). Fig. 2(b) shows an example of superficial spreading melanoma, where the image looks like normal skin tissue unless the
epidermis area is examined carefully. For automated diagnosis of the
skin tissue, the existing techniques mentioned in Section 2.1 are not
expected to get satisfied results since most of the existing techniques
focus on color or texture features at pixel-level [18,25,10,30].
Table 1
Related works on the whole slide histopathology image analysis.
Tissue type
Reference
Dataset info.
Breast
Head and neck
Cervical
Neuroblastoma
Neuroblastoma
Breast
Skin
Petushi et al. [21]
Mete et al. [18]
Wang et al. [30]
Sertel et al. [25]
Kong et al. [10]
Roullier et al. [24]
Smolle and Gerger [28]
24H&E stained, 20 7H&E stained, 20 31H&E stained, 40 43H&E stained, 40 33H&E stained, 40 H&E stained, 20 H&E stained, 10 2739
Fig. 1. The anatomy of a skin tissue.
2740
C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750
Fig. 2. Two examples of skin WSI. (a) An example of melanocytic nevus. (b) An example of superficial spreading melanoma. (For interpretation of the references to color in
this figure, the reader is referred to the web version of this paper.)
Fig. 3. The overall schematic of the proposed technique.
A few research works have been reported on the HPI analysis of
the skin cancer. Smolle [27] performed a pilot research on the
digital skin histopathological image analysis. A selective portion of
the whole skin tissue slide is first digitized into an image. This
image of the skin specimens is then divided into non-overlap
blocks of equal size and shape. A set of grey level, color and
Haralick texture features [6] is calculated for each image blocks.
Finally, based on the extracted features, multidimensional linear
discriminant and hierarchical clustering are applied to classify the
images blocks into two classes: normal and malignant.
Smolle and Gerger [28] extended the block-based analysis
schema where they applied tissue counter analysis (TCA) to the
histologic sections of skin tumor. The basic idea of TCA is to
“count” the classified image blocks in a given histological specimen. The histologic specimen is then be labeled as “benign” or
“malignant”, based on the relative proportion of image blocks that
belong to different classes. Experimental results showed that the
TCA might be a useful method for the interpretation of melanocytic skin tumors.
The above-mentioned skin tissue analysis techniques only
reveal the feasibility of texture and color features and TCA for
classification of different types of sampled skin area from the
whole glass slide. There are several limitations of these skin tissue
analysis techniques [27,28]. First, these techniques only consider
the pixel-level features (i.e., texture and color features) of the
sampled area. However, the object-level features, e.g., the distribution of melanocytes, are of interest to the pathologist. Second,
these techniques may fail when there exists staining variation.
Third, these techniques only consider a representative region
within the whole glass slide, and therefore appropriate selection
of representative region requires user interactions.
3. The proposed technique
3.1. The overview of the proposed technique
The schematic of the proposed technique is shown in Fig. 3. The
proposed technique consists of five modules. Given a WSI, the first
three cascade modules are used for the segmentation and detection
of ROIs. The epidermis area segmentation module aims to segment the
skin epidermis in the WSI. The keratinocytes segmentation module is
C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750
applied to segment the keratinocytes. Note that the keratinocytes are
cells (which also include melanocytes) present in the epidermis area.
The melanocytes detection module is used to detect the melanocytes
from the pre-segmented keratinocytes. Based on the pre-segmented
epidermis area, keratinocytes, and melanocytes, the features construction module will consider the spatial arrangement and morphological characteristics of the pre-segmented ROIs and construct the
features according to the diagnostic factors used by pathologists. The
last module, i.e., the classification module, utilizes the features to
classify the tissues into three categories: melanoma, nevus, or normal
skin. Note that before we processed the images, we perform a color
normalization operation [19] for all images.
3.2. Epidermis area segmentation
The epidermis area is the most important observation area for
the diagnosis of a skin tissue. In most cases of melanoma, the
grading of the cancer can be made by analyzing the architectural
and morphological features of atypical cells in the epidermis or
epidermis-dermis junctional area. Therefore, segmentation of the
epidermis area is an important step before further analysis is
performed.
In one of our previous works [16], an efficient technique for
automatic epidermis area segmentation and analysis has been
proposed. This technique is adopted here for the epidermis segmentation and high resolution (HR) image tile generation. The
schematic of the epidermis area segmentation technique is shown
in Fig. 4 and explained as follows [16]:
Down sampling: In order to reduce the processing time, the
high resolution (HR) WSI is first down sampled into a low
resolution (LR) WSI.
Segmentation of epidermis: In this step, the red channel of the
WSI is selected for the segmentation since it provides the most
discrimination between the epidermis and other regions [16].
Otsu's threshold method [20] is then applied for the initial
segmentation. In order to eliminate the false positive regions
after the thresholding, two criteria based on the area size and
shape are used to determine the real epidermis area. Finally,
morphological operations (e.g., opening and closing) are applied
to fill in the holes and smooth the boundary. An epidermis mask
is thus obtained.
Analysis of epidermis layout: Based on the epidermis mask, the
epidermis layout is determined. The layout of epidermis is
2741
defined as horizontal or vertical. The entire epidermis mask is
first divided into small grids. Within each grid, the percentage
of bright pixels is counted. If the percentage of the bright pixels
is greater than a threshold, this grid is assigned a value 1,
otherwise 0. A straight line is then fitted through the grids with
value 1 to determine the acute angle between the epidermis
and the horizontal line. If the angle is greater than or equal to
451, the epidermis layout is vertical, otherwise horizontal.
Generation of HR image tiles: The segmented epidermis in LR is
first mapped to HR. The whole segmented epidermis area is
then divided into several image tiles for further analysis. Based
on the layout of the epidermis, the HR image tiles are generated
across the epidermis area so that at each image tile all sublayers of the epidermis can be observed. If the layout is vertical
(horizontal), we generate the image tiles by stitching the image
blocks with segmented epidermis mask horizontally (vertically). An example is illustrated in Fig. 5.
In Fig. 5, the binary mask represents the segmented epidermis
area, where the white region indicates the epidermis area and the
black region indicates the background and other regions. The layout
of the epidermis is horizontal, and therefore the image tiles are
generated vertically. The thick rectangles indicate the generated
image tiles for further processing. Some of the snapshots of the HR
image tiles are shown in Fig. 5.
Three epidermis area segmentation results are shown in Fig. 6. In
Fig. 6(a), (c), and (e), the original images with manually labeled
epidermis area (shown as bright contour) are shown. Fig. 6(b), (d),
and (f) shows the binary mask that represents the segmentation
result.
3.3. Keratinocytes segmentation
Based on the HR image tiles obtained from Section 3.2, it is now
possible to segment the keratinocytes in the epidermis area.
Considering the intensity variations, an adaptive threshold technique is applied to segment the nuclei regions [13]. This technique
has two main steps which are described as follows.
3.3.1. Hybrid gray-scale morphological reconstruction (HGMR)
In order to reduce the influence from undesirable variations in
the image and to make the nuclei region homogeneous, a hybrid
gray-scale morphological reconstruction method is applied. The
steps of HGMR are described below:
Fig. 4. The schematic for epidermis area segmentation module.
2742
C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750
Fig. 5. An example of generating the image tiles from the binary epidermis mask (shown in the center). Note that the layout of the epidermis is horizontal in this example,
and hence the image tiles are generated vertically. The rectangles indicate the generated image tiles for further processing. Some of the snapshots of the image tiles are
present.
Fig. 6. Three examples of the epidermis segmentation results. (a), (c), and (e) show original images with manually labeled epidermis area (shown as bright contour). (b), (d),
and (f) show the binary images that represent the segmentation result where the white region indicates the epidermis area, and the black region indicates the background
and other regions.
C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750
(a) Complement of the image: Assuming an 8-bit image R, the
complement image R is calculated as follows:
Rðx; yÞ ¼ 255 Rðx; yÞ
ð1Þ
where (x, y) is the coordinate.
(b) Opening-by-reconstruction: In order to enhance the nuclei
regions, the opening-by-reconstruction operation [5] is performed
on the image R as follows:
R obr ¼ RðR e ; RÞ
ð2Þ
where R is the morphological reconstruction operator [5],
R e ¼ R⊖S (where ⊖ is the erosion operator), and S is the structuring element. The eroded image obtained by applying erosion
in Fig. 7(b) is shown in Fig. 7(c). The result of the opening-byreconstruction is shown in Fig. 7(d). Comparing Fig. 7(d) and (b), it
is observed that the nuclei regions have been enhanced.
(c) Closing-by-reconstruction: In order to reduce the noise
further, the closing-by-reconstruction is performed on R obr as
follows:
R obrcbr ¼ 255 Rðð255 R obr Þ⊖S; 255 R obr Þ
ð3Þ
The result of the closing-by-reconstruction is shown in Fig. 7(g).
Note that the intensity within the nuclei regions is more homogeneous compared to that in Fig. 7(d).
(d) Complement of the image: This step calculates the complement of R obrcbr in order to map the image into the original intensity
space, i.e., R0 ðx; yÞ ¼ 255 R obrcbr ðx; yÞ. The final image is shown in
Fig. 7(h).
3.3.2. Local region adaptive threshold selection
For initial segmentation, a global threshold (Otsu's threshold
method [20]) is first applied on the image. We have found that the
touching of multiple nuclei often results in an abnormally large
region (ALR), and degrades the segmentation performance. To
address this, a local region adaptive threshold selection (LRATS)
method has been proposed to achieve finer segmentation [13]. In
LRATS, a local threshold is determined by minimizing a predefined cost function. Denote an ALR as L and the kth fragment
obtained by a threshold t within an ALR as Lk(t), the cost C(t) is
calculated for the current threshold t as follows:
CðtÞ ¼
K
1X
½ΦE ðLk ðtÞÞ þ ΦA ðLk ðtÞÞ
Kk¼1
ð4Þ
2743
where K is the total number of segmented fragments obtained by
the current threshold t, and ΦE ðLk ðtÞÞ and ΦA ðLk ðtÞÞ are two penalty
functions that correspond to the ellipticity and area of the regions,
respectively. Intuitively, if the segmented fragments are close to
elliptical shape and the segmented areas are within the predefined
area range [Amin, Amax], the cost function C(t) will have a
small value.
An epidermis area is shown in Fig. 8(a), and the keratinocytes
segmentation result is shown in Fig. 8(b).
3.4. Melanocytes detection
In the skin melanoma diagnosis, the architectural and cellular
features (e.g., size, distribution, location) of the melanocytes in the
epidermis area are important factors. In this module, the melanocytes are detected from the pre-segmented keratinocytes.
In the epidermis area of skin, a normal melanocyte is typically a cell
with a dark nuclei, lying singly in the basal layer of epidermis. However,
in the melanoma or nevus, the melanocytes grow abnormally, and can
be found in the middle layer of epidermis. The digitized histopathological images we used in this work are stained with hematoxylin and
eosin (H&E). Three examples of skin epidermis images are shown in
Fig. 9(a), (b), and (c). The cell nuclei are observed as dark blue whereas
the intra-cellular material and cytoplasm are observed as bright pink.
Note that the bright seed points indicate the location of melanocytes
whereas other nuclei are the other keratinocytes.
It is observed that the differences between melanocytes and other
keratinocytes are the surrounding region. The melanocytes generally
have brighter halo-like surrounding regions and are retracted from
other cells, due to the shrinkage of cytoplasm [31,14]. One closeup
example of a melanocyte is shown in Fig. 9(d), where the outer
dotted contour represents the halo region and the inner solid contour
represents the nuclei. In contrast, the other keratinocytes are in close
contact with the cytoplasm and have no or little brighter area. The
brighter halo-like region of the melanocyte is an important pattern
for differentiation of the melanocytes and other keratinocytes.
In this module, we adopted one of our previous works for the
melanocytes detection [15]. The schematic of the melanocyte detection
technique is shown in Fig. 10. Based on the pre-segmented keratinocytes, the melanocyte detection technique first estimated the outer
boundary of halo region using the radial line scanning (RLS) method. At
first, the RLS method initializes the radial center and radial lines for
each pre-segmented keratinocytes (see Fig. 11(c)). On each radial line,
the gradient information is calculated and the outer boundary points of
Fig. 7. Example of hybrid gray-scale morphological reconstruction. Keratinocytes segmentation result in epidermis area. (a) Original image, (b) Complement of the image,
(c)Image after erosion, (d) Opening-by-Reconstruction, (e) Complement of the image, (f) Image after erosion, (g) closing-by-Reconstruction and (h) Final result.
2744
C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750
Fig. 8. An example of keratinocytes segmentation result in epidermis area: (a) shows a small portion of the epidermis in WSI, (b) shows the automatic segmented
keratinocytes (indicated by the thick bright contours) within the epidermis area (indicated by the thick black line), and (c) shows the automatic segmented melanocytes
(indicated by the thick bright contours).
the halo region are estimated (the inner boundary is already obtained
by the keratinocytes segmentation module). Intuitively, the point on a
radial line whose gradient direction is going inwards is determined as
the outer boundary point of the halo. Examples of the estimated halo
regions are shown in Fig. 11(e). Once the halo region is estimated, the
melanocytes are then determined based on the ratio of the estimated
halo region and the original nuclei region. The ratio, RHN, is calculated
as follows:
RHN ¼
AHR
ANR
ð5Þ
where AHR and ANR are the areas of the halo and nuclei regions,
respectively. For the melanocyte detection, we select the nuclei regions
as the melanocytes if its RHN value is greater than a predefined
threshold T RHN . Examples of the value RHN are shown in Fig. 11(e). A
melanocytes detection result is shown in Fig. 8(c).
The spatial distribution of the melanocytes in epidermis area is an
important diagnostic factor. In normal skin epidermis, the melanocytes are evenly distributed along the basal layer of the epidermis. The
ratio between the melanocytes and other keratinocytes in skin tissues
is between 1:10 and 1:20, respectively. In the case of melanotic nevus,
there is a large number of melanocytes and these melanocytes may
form several nests in the junction of epidermis and dermis area or
penetrate into the dermis area. In the case of melanoma, the number
of the melanocytes is large as well, but not as large as that in the case
of nevus. However, the melanocytes may invade into the upper layers
of the epidermis or invade deep into the dermis area. Based on the
characteristics mentioned above, we utilize the number and location
of the pre-segmented ROIs, i.e., keratinocytes, melanocytes to construct the spatial distribution features for further analysis.
In order to calculate the spatial distribution, we first determine
the thickness of the epidermis. The spatial distribution features are
then generated.
3.5. Features construction: spatial distribution features
In the previous modules, the epidermis area, the keratinocytes and
the melanocytes in the epidermis area are segmented and detected by
the automated techniques. These ROIs are now ready to be further
analyzed by computer-aided techniques based on the pathologist's
evaluation procedure. In the manual evaluation, several important
factors are considered by a pathologist. In the proposed technique,
these important factors are considered for the classification.
3.5.1. Local thickness measurement
The thickness of epidermis in skin varies significantly, and
therefore the thickness of epidermis needs to be measured locally.
The measurement method is described below:
Determine the boundary of epidermis base: At first, based on the
layout of the epidermis area, horizontal or vertical, the outer
C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750
2745
Fig. 9. Melanocytes in epidermis area from different skin samples. Inter- and intra-image variations are observed in terms of the color. These images are sampled from the
skin WSI. In (a), (b), and (c), the bright seed points indicate the location of melanocytes whereas other nuclei are keratinocytes. (d) is a close up image of a melanocyte. (For
interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)
Fig. 10. The schematic of the melanocyte detection technique.
boundary of epidermis is determined. Fig. 12(a) shows a horizontal layout skin tissue and Fig. 12(b) shows a zoomed version
of local epidermis area. The boundary of epidermis base is
highlighted in Fig. 12(b).
Assign sampling points: On the pre-determined base boundary,
we select a few points by fixed-interval sampling. Assuming
that there are K points on the outer boundary B, and setting the
interval to ε ¼5, we obtain a set of boundary points as follows:
Λ ¼ fpi A Bj i ¼ ε j; j ¼ 1; …; ⌊K=εcg:
ð6Þ
Fig. 12(c) illustrates the sampling points (shown as bright dots)
on the epidermis base boundary.
Measure local thickness: Based on the sampling points on the
base boundary, the thickness of the epidermis is measured.
Given a sampling point pi A B, we obtain its neighboring
sampling points pi 1 and pi þ 1 . A curve C is estimated based
on the three points pi, pi 1 and pi þ 1 using a polynomial curve
fitting method. A line Ti , which is tangent to the fitted curve at
point pi, is then calculated. The line Ni normal to the tangent
line Ti will reflect the orientation of local epidermis area. The
local depth measure Di is calculated as the distance from the
current sampling point pi A B along the estimated normal line
Ni until it reaches the inner boundary of the epidermis (i.e., the
dermis and epidermis junction). This is illustrated in Fig. 12(b).
For each sampling point, the local thickness is measured.
3.5.2. Spatial distribution features
Based on the thickness measured in the last section, we divide
the epidermis into three sub-layers evenly. Note that in a normal
skin, the melanocytes are typically present at the bottom layer of
epidermis, near the junction of epidermis and dermis layer, known
as basal layer. In the case of superficial spreading melanoma, the
melanocytes invade into the middle layer of epidermis. Therefore,
dividing the epidermis into three sub-layers has been found to be
a good trade-off between simplicity and efficiency to quantify the
melanocytes distribution in the epidermis area. Let these three
sub-layers be denoted by Linner , Lmiddle , and Louter . This is illustrated
in Fig. 12(c) and (d), where the boundary of the three layers is
highlighted. The number of pre-segmented keratinocytes and
melanocytes which are located within these three sub-layers is
2746
C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750
Fig. 11. Illustration of the RLS method: (a) is the original image with three nuclei regions where the melanocyte is indicates with letter ‘M’, (b) shows the segmented nuclei
regions, (c) shows the radial lines for each nuclei region (nuclei regions are shown with thick contours), (d) shows the suppressed smoothed gradient map, and (e) shows the
halo region estimation result. The value of the parameter RHN is shown inside the nuclei region.
then calculated. The ratio of the melanocytes and keratinocytes
within these sub-layers is calculated as a set of features for the
classification. Denote N Kera and N Mela as the number of keratinocytes and melanocytes in epidermis area, respectively. The ratio of
the number of the keratinocytes and melanocytes in inner, middle,
outer layers is calculated as follows:
L
L
inner
inner
inner
¼ N Mela
=N Kera
;
RLMK
L
L
ð7Þ
middle
middle
middle
¼ N Mela
=N Kera
;
RLMK
ð8Þ
outer
outer
outer
RLMK
¼ N LMela
=N LKera
:
ð9Þ
3.6. Features construction: morphological features of melanocytes
Besides the spatial distribution of the pre-segmented melanocytes, the morphological features are also important features for
the classification. The morphological features will reflect the
grading of the skin tissue under examination.
The five morphological features used in this paper are presented below:
Area: The area of the detected nuclei.
Perimeter: The perimeter of the detected nuclei.
Eccentricity: The eccentricity is the ratio of the distance between
the foci of the best fit ellipse and the major axis length.
Equivalent diameter: Measure the diameter of apcircle
that has
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi
the same area of current measured region, i.e., 4 Area=π .
Ellipticity: The ratio of the major axis and the minor axis of the
best fit ellipse.
For each feature, we calculate the mean value and the SD value for
the pre-segmented melanocytes in the epidermis area. This results
in 10 morphological features.
In total, a 13 dimensional feature vector F (three spatial distribution features and ten morphological features) is constructed for each
skin WSI.
3.7. Classification
After the feature construction, a classification technique is applied
to classify the WSI into three classes. In this work, support vector
machine (SVM) is used for the classification. While the SVM is
originally proposed for two-class classification, it has been extended
for the multi-class classification [11,3,7]. The “one-against-one”
multi-class SVM method has been shown to provide better performance than other multi-class SVM methods [7]. In “one-against-one”
multi-class SVM method, the multi-class classification problem is
decomposed into nðn 1Þ=2 two-class classification problems, where
n is the number of classes. These nðn 1Þ=2 two-class SVMs are
trained using the samples from the corresponding paired classes.
The hyperplane to separate the ith class and the jth class can be
expressed as
ðwij ÞT ϕðF Þ þ b ¼ 0
ij
ð10Þ
where w and b are the coefficients for the hyperplane, and ϕðÞ is
the mapping function.
In the testing phase, the testing sample will be labeled by all the
nðn 1Þ=2 two-class SVMs. If the current SVM labels the sample as
the ith class, then the votes of the ith class are increased by 1. After
the testing sample is labeled by all nðn 1Þ=2 two-class SVMs, the
accumulated votes are calculated. The class label that has the
ij
ij
C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750
2747
Fig. 12. Epidermis depth measurement, and construction of sub-layers for the epidermis.
Table 2
Description of the image dataset.
Table 3
Performance evaluations.
Class
No.
Percentage (%)
Superficial spreading melanoma
Melanocytic nevus
Normal skin
32
17
17
48.48
25.76
25.76
Total
66
100
Techniques
Classification accuracy
(%)
TCA technique
Proposed technique with all (13) features
Proposed technique with only distribution-based
(3) features
Proposed technique with only morphological (10)
features
70
89.0772.72
77.41 71.19
83.217 2.29
maximum votes will be assigned to the testing sample. In this paper,
the SVM with linear kernel is utilized for the classification.
4. Experimental results
superficial spreading melanoma, melanocytic nevus, and normal
skin. Table 2 shows the distribution of these three classes. The
digital images were captured under 40 magnification on Carl
Zeiss MIRAX MIDI Scanning technique (Carl Zeiss Inc., Germany).
4.1. Image dataset
4.2. Classification results
In this study, we have evaluated the proposed technique on 66
different whole slide cutaneous histopathological images. The
histological sections used for image acquisition are prepared from
formalin-fixed paraffin-embedded tissue blocks of skin biopsies.
The sections prepared are about 4 μm thick and are stained with
H&E using automated stainer. The skin samples used includes
We use the 10-fold cross validation [2] for the evaluation. In
other words, we first divide the whole dataset into 10 subsets. We
then use 9 subsets as the training set and the remaining one subset
as the testing set, and we obtain the performance. We repeat this
procedure for 10 times with different testing sets. We also repeat
2748
C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750
Fig. 13. An example of misclassified skin tissue: (a) a snapshot of a skin WSI and (b) a zoom in image of the rectangle area in (a).
Table 4
Top-8 important features.
Index
1
2
3
4
5
6
7
8
Feature name
Classification accuracy (%)
Eccentricity-Std
66.77 7 3.38
64.64 7 3.73
inner
RLMK
Perimeter-Mean
outer
RLMK
Equivalent Diameter-Mean
Area-Mean
middle
RLMK
Ellipticity-Mean
59.687 4.41
59.54 7 6.04
59.45 7 5.31
57.62 7 6.16
55.85 7 2.87
54.22 7 6.87
the 10-fold cross validation for 100 times, the average performance
is used as the final performance for the classification.
It is observed that the proposed technique is able to classify
most of the skin tissues correctly by using all features. If only the
distribution-based features are used, the classification accuracy
goes down to about 78%. It is noted that if only the morphological
features are used, about 84% accuracy is achieved.
Since, to the author's knowledge, there is no existing technique
for skin WSI classification in the literature, we have adopted and
implemented the TCA technique [28] for comparison. The classification accuracy of the TCA technique is shown in the first row of
Table 3. In the implementation of the TCA technique, we randomly
selected 12 WSIs for the training phase and selected the remaining
WSIs for the testing phase. In the testing phase, the WSI is first
divided into small image blocks. In each WSI, the background blocks
and the tissue blocks are manually labeled. Within the tissue blocks,
the blocks containing cells (denoted as cell block) and the blocks
containing other cytological components are manually labeled.
Within the cell blocks, the blocks containing malignant cells and
the blocks containing benign cells are manually labeled. The features
extracted from these three paired of blocks (i.e., background and
tissue blocks, cell blocks and other cytological components blocks,
malignant cells and benign cells blocks) are then used to train three
classifiers. In the testing phase, the WSI is divided into small image
blocks. Each block is being classified by the pre-trained classifiers.
The percentages of the malignant cell blocks and benign cell blocks
with respect to the total number of blocks are then calculated. If
the percentage of the malignant cells blocks is dominant (greater
than 50%), a WSI is determined as a melanoma, otherwise normal or
nevus tissue. Note that in the implementation of the TCA technique,
we consider a two-class classification problem, i.e., the malignant
tissue (melanoma) versus benign tissue (nevus and normal skin).
It is observed in Table 3 that the classification performance of the
TCA technique is 70%, which is lower than the proposed technique.
One reason behind this performance is that the TCA technique only
considers the pixel-level features, the staining will degrade the
classification performance. However, the proposed technique considers the object-level features which are less affected by the
staining variations. Another drawback of the TCA technique is that
it requires user interactions for the training blocks selection whereas
the proposed technique requires less user interactions.
An example of the misclassification by the proposed technique is
shown in Fig. 13. This skin tissue belongs to the melanoma class
(in-transit metastasis type). The proposed technique misclassified
this skin tissue as the normal class. This is because this melanoma
skin tissue is in-transit metastasis, and hence the spatial distribution and the morphological features in the WSI (in the epidermis)
look very similar to that of the normal skin. Therefore, the proposed
technique misclassifies this melanoma tissue as a normal class.
Incorporating additional features from dermis area is required to
classify it as melanoma.
4.3. Importance of features
We evaluated the importance of each individual feature by
evaluating their contribution to the classification. Instead of using
all 13 features for the classification, we used single feature out of
13 features iteratively for the classification, the corresponding
classification results are then recorded. Table 4 shows the top-8
important features according to the classification accuracy.
4.4. Parameter selections
Several parameters are used in the proposed framework. In the
keratinocytes segmentation module, there is a pair of parameters
named predefined area range [Amin, Amax]. These parameters are
used for calculating the cost function C(t) , in Eq. (4), to find
optimum threshold for keratinocytes segmentation. The area range
[Amin, Amax] aims to define a possible size range of keratinocytes in
the image. If the range has great discrepancy with the real situation,
the keratinocytes segmentation result will be degraded and the
final classification result will not be accurate. For example, under a
certain magnification, a typical keratinocyte approximately contains
300 pixels and an atypia keratinocyte contains 400 pixels. However,
if the range is set to [100, 800], larger segment fragments will be
C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750
2749
Fig. 14. Evaluation on parameter selection: (a) parameter Amax in [200,1000] and the corresponding classification accuracy, and (b) parameter T RNH in [0.1,1.3] and the
corresponding classification accuracy.
included in the final segmentation result, and will lead to oversegmentation (many nuclei will be treated as one object). This will
affect the spatial distribution features, described in Section 3.5.2,
where the counting of the keratinocytes is used. The area range
[Amin, Amax] can easily be estimated using a few sample images with
keratinocytes. In the proposed framework, [Amin, Amax] is set to [100,
500]. In order to evaluate the effect on tuning the area range [Amin,
Amax], we set Amin ¼100, and we tune the Amax from 200 to 1000
with a step of 50. The corresponding classification accuracy is
shown in Fig. 14(a). It is observed that the classification accuracy
is above 85% when the Amax is around 300–700, whereas the
classification performance decreases if Amax is larger than 700.
In the melanocytes detection module, there is a parameter
named T RHN , in Section 3.4, that determines whether a keratinocyte
is a melanocyte or another cell type. In the melanocytes detection
module, ratio between the areas of the halo and nuclei regions, RHN,
is used as a descriptor to differentiate the melanocytes and other
types of cells. Thus the threshold, T RHN , is important for getting high
detection accuracy rate. Similar to the parameter [Amin, Amax] in the
keratinocyte segmentation module, the threshold T RHN will affect
the spatial distribution features, described in Section 3.5.2, where
the counting of the melanocytes is used. The parameter T RHN can be
determined by using a few sample images with epidermis that
contains melanocytes and keratinocytes. In the proposed framework, T RHN is set to 0.8. In order to evaluate the effect on tuning
the T RHN , we tune the T RHN from 0.1 to 1.3 with a step of 0.1, and the
classification accuracy is evaluated. Fig. 14(b) shows the evaluation
result. It is observed that the classification accuracy is above 85%
when the T RHN is around 0.4–1.
4.5. Computational complexity evaluation
All experiments were carried out on a 2.4-GHz Intel Core II 597
Duo CPU with 3-GB RAM using MATLAB 7.04. On average, the
proposed technique takes about 36 min to finish the analysis on a
12,000 10,000 color WSI, which contains about 2300 nuclei and
350 melanocytes in the epidermis area. The epidermis segmentation module takes 0.5% of the whole processing time. The keratinocytes segmentation module takes 2.6% of the whole processing
time. The melanocytes detection module takes 94.4% of the whole
processing time. The feature construction module and the classification module take 2.5% of the whole processing time.
5. Conclusion
In this paper, we present an automatic analysis and classification
technique for the melanotic skin WSI. The epidermis area, keratinocytes, and melanocytes are first segmented by automatic
techniques. The spatial distribution and morphological features
based on the pre-segmented ROIs are then extracted, and classified.
The proposed technique provides about 90% accuracy on the
classification of melanotic nevus, melanoma, and normal skin. In
future, features in dermis area will be analyzed in order to improve
the classification performance.
Conflict of interest
None declared.
Acknowledgment
The authors would like to acknowledge the funding of the
National Natural Science Foundation of China (Grant no. 61401263)
and Fundamental Research Funds for the Central Universities of
China (Grant no. GK201402037). We thank Dr. Naresh Jha, and
Dr. Muhammad Mahmood of the University of Alberta Hospital for
providing the images and giving helpful advice.
References
[1] American Cancer Society, What are the Key Statistics About Melanoma?
Technical Report, American Cancer Society, 2008.
[2] C. Bishop, Neural Networks for Pattern Recognition, Oxford University Press,
Oxford, UK, 1995.
[3] K. Crammer, Y. Singer, On the learnability and design of output codes for
multiclass problems, Mach. Learn. 47 (2) (2002) 201–233.
[4] S. Doyle, M. Feldman, J. Tomaszewski, A. Madabhushi, A boosted Bayesian
multi-resolution classifier for prostate cancer detection from digitized needle
biopsies. IEEE Trans. Biomed. Eng. 59(5) (2012) 1205–1218.
[5] R. Gonzalez, R. Woods, Digital Image Processing, 2002.
[6] R. Haralick, K. Shanmugam, I. Dinstein, Textural features for image classification, IEEE Trans. Syst. Man Cybern. 3 (6) (1973) 610–621.
[7] C. Hsu, C. Lin, A comparison of methods for multiclass support vector
machines, IEEE Trans. Neural Netw. 13 (2) (2002) 415–425.
[8] Huang C., Veillard A., Roux L., Lomenie N., Racoceanu D., Time-efficient sparse
analysis of histopathological whole slide images, Comput. Med. Imaging
Graph. 35 (7) (2010) 579–591.
[9] S. Ismail, A. Colclough, J. Dinnen, D. Eakins, D. Evans, E. Gradwell, J. O'Sullivan,
J. Summerell, R. Newcombe, Observer variation in histopathological diagnosis and
grading of cervical intraepithelial neoplasia, Br. Med. J. 298 (6675) (1989) 707.
[10] J. Kong, O. Sertel, H. Shimada, K.L. Boyer, J.H. Saltz, M.N. Gurcan, Computer-aided
evaluation of neuroblastoma on whole-slide histology images: classifying grade
of neuroblastic differentiation, Pattern Recognit. 42 (6) (2009) 1080–1092.
[11] U. Kreßel, Pairwise classification and support vector machines, in: Advances in
Kernel Methods, MIT Press, Cambridge, MA, 1999, pp. 255–268.
[12] V. Kumar, A. Abbas, N. Fausto, et al., Robbins and Cotran Pathologic Basis of
Disease, Elsevier Saunders, Philadelphia, 2005.
[13] C. Lu, M. Mahmood, M. Jha, A robust automatic nuclei segmentation technique
for quantitative histopathological image analysis, Anal. Quant. Cytol. Histopathol. 12 (2012) 296–308.
[14] C. Lu, M. Mahmood, N. Jha, M. Mandal, Automated segmentation of the
melanocytes in skin histopathological images, IEEE J. Biomed. Health Informatics 17 (2) (2013) 284–296.
2750
C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750
[15] C. Lu, M. Mahmood, N. Jha, M. Mandal, Detection of melanocytes in skin
histopathological images using radial line scanning, Pattern Recognit. 46 (2)
(2013) 509–518.
[16] C. Lu, M. Mandal, Automated segmentation and analysis of the epidermis area in
skin histopathological images, in: 2012 Annual International Conference of the
IEEE Engineering in Medicine and Biology Society (EMBC), 2012, pp. 5355–5359.
[17] I. Maglogiannis, C. Doukas, Overview of advanced computer vision systems for
skin lesions characterization, IEEE Trans. Inf. Technol. Biomed. 13 (5) (2009)
721–733.
[18] M. Mete, X. Xu, C.-Y. Fan, G. Shafirstein, Automatic delineation of malignancy
in histopathological head and neck slides, BMC Bioinform. 8 (Suppl 7) (2007)
S17, doi:http://dx.doi.org/10.1186/1471-2105-8-S7-S17.
[19] M. Niethammer, D. Borland, J. Marron, J. Woosley, N.E. Thomas, Appearance
normalization of histology slides, in: Machine Learning in Medical Imaging,
Springer, Berlin, Heidelberg, 2010, pp. 58–66.
[20] N. Otsu, A threshold selection method from gray-level histograms, IEEE Trans.
Syst. Man Cybern. 9 (1) (1979) 62–66.
[21] S. Petushi, F.U. Garcia, M.M. Haber, C. Katsinis, A. Tozeren, Large-scale
computations on histology images reveal grade-differentiating parameters
for breast cancer, BMC Med. Imaging 6 (2006) 14.
[22] D. Rigel, J. Russak, R. Friedman, The evolution of melanoma diagnosis: 25 years
beyond the ABCDs, CA Cancer J. Clin. 60 (5) (2010) 301–316.
[23] M. Rojo, G. García, C. Mateos, J. García, M. Vicente, Critical comparison of 31
commercially available digital slide systems in pathology, Int. J. Surg. Pathol.
14 (4) (2006) 285–305.
[24] V. Roullier, O. Lezoray, V. Ta, A. Elmoataz, Multi-resolution graph-based
analysis of histopathological whole slide images: application to mitotic cell
extraction and visualization. Comput. Med. Imaging Graph. (2011).
[25] O. Sertel, J. Kong, H. Shimada, U.V. Catalyurek, J.H. Saltz, M.N. Gurcan,
Computer-aided prognosis of neuroblastoma on whole-slide images: Classification of stromal development, Pattern Recognit. 42 (6) (2009) 1093–1103.
[26] N. Signolle, M. Revenu, B. Plancoulaine, P. Herlin, Wavelet-based multiscale
texture segmentation: application to stromal compartment characterization
on virtual slides, Signal Process. 90 (8) (2010) 2412–2422.
[27] J. Smolle, Computer recognition of skin structures using discriminant and
cluster analysis, Skin Res. Technol. 6 (2) (2000) 58–63.
[28] J. Smolle, A. Gerger, Tissue counter analysis of tissue components in skin
biopsies—evaluation using cart (classification and regression trees), Am.
J. Dermatopathol. 25 (3) (2003) 215–222.
[29] M. Vestergaard, P. Macaskill, P. Holt, S. Menzies, Dermoscopy compared with
naked eye examination for the diagnosis of primary melanoma: a metaanalysis of studies performed in a clinical setting, Br. J. Dermatol. 159 (3)
(2008) 669–676.
[30] Y. Wang, D. Crookes, O.S. Eldin, S. Wang, P. Hamilton, J. Diamond, Assisted
diagnosis of cervical intraepithelial neoplasia (cin), IEEE J. Sel. Top. Signal
Process. 3 (1) (2009) 112–121.
[31] D. Weedon, G. Strutton, Skin Pathology, vol. 430, Churchill Livingstone, New
York, 2002.
[32] R. Weinstein, A. Graham, L. Richter, G. Barker, E. Krupinski, A. Lopez, K. Erps,
A. Bhattacharyya, Y. Yagi, J. Gilbertson, Overview of telepathology, virtual
microscopy, and whole slide imaging: prospects for the future, Hum. Pathol.
40 (8) (2009) 1057.
Cheng Lu received his Ph.D. degree in electrical and computer engineering from University of Alberta, in 2013. He is now working in Shaanxi Normal University, China. He is
the recipient of Graduate Student Interdisciplinary Research Award 2012, University of Alberta. His research interests include medical image analysis, pattern recognition and
computer vision. He is an author or coauthor of more than ten papers in leading international journals and conferences.
Mrinal Mandal is a Full Professor and an Associate Chair in the Department of Electrical and Computer Engineering and is the Director of the Multimedia Computing and
Communications Laboratory at the University of Alberta, Edmonton, Canada. He has authored the book Multimedia Signals and Systems (Kluwer Academic), and coauthored
the book Continuous and Discrete Time Signals and Systems (Cambridge University Press). His current research interests include Multimedia, Image and Video Processing,
Multimedia Communications, Medical Image Analysis. He has published over 140 papers in refereed journals and conferences, and has a US patent on lifting wavelet
transform architecture. He has been the Principal Investigator of projects funded by Canadian Networks of Centers of Excellence such as CITR and MICRONET, and is currently
the Principal Investigator of a project funded by the NSERC. He was a recipient of Canadian Commonwealth Fellowship from 1993 to 1998, and Humboldt Research
Fellowship from 2005 to 2006 at Technical University of Berlin.
Download