Pattern Recognition 48 (2015) 2738–2750 Contents lists available at ScienceDirect Pattern Recognition journal homepage: www.elsevier.com/locate/pr Automated analysis and diagnosis of skin melanoma on whole slide histopathological images Cheng Lu a,n, Mrinal Mandal b a b College of Computer Science, Shaanxi Normal University, Xi'an, Shaanxi Province 710119, China Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta, Canada T6G 2V4 art ic l e i nf o a b s t r a c t Article history: Received 12 February 2013 Received in revised form 19 August 2014 Accepted 25 February 2015 Available online 5 March 2015 Melanoma is the most aggressive type of skin cancer, and the pathological examination remains the gold standard for the final diagnosis. Traditionally, the histopathology slides are examined under a microscope by pathologists which typically leads to inter- and intra-observer variations. In addition, it is time consuming and tedious to analyze a whole glass slide manually. In this paper, we propose an efficient technique for automated analysis and diagnosis of the skin whole slide image. The proposed technique consists of five modules: epidermis segmentation, keratinocytes segmentation, melanocytes detection, feature construction and classification. Since the epidermis, keratinocytes and melanocytes are important cues for the pathologists, these regions are first segmented. Based on the segmented regions of interest, the spatial distribution and morphological features are constructed. These features, representing a skin tissue, are classified by a multi-class support vector machine classifier. Experimental results show that the proposed technique is able to provide a satisfactory performance (with about 90% classification accuracy) and is able to assist the pathologist for the skin tissue analysis and diagnosis. & 2015 Elsevier Ltd. All rights reserved. Keywords: Histopathological image analysis Object detection Image analysis Skin cancer Melanoma 1. Introduction Cancer is a major cause of death all over the world. It is caused by the uncontrolled growth of abnormal cells in the body. Skin cancer is one of the most frequent types of cancer, and melanoma is the most aggressive type of skin cancer [17]. According to a recent article, approximately 70,000 people are diagnosed with melanoma skin cancer, and about 9000 die from it in the United States alone every year [1]. The malignant melanoma is curable, if it is diagnosed at early stages [17]. Therefore, an early detection and accurate prognosis of malignant melanoma will definitely help us to lower the mortality from this cancer. Approaches to melanoma diagnosis have dynamically evolved during the past 25 years [22]. Although there are many new emerging techniques, e.g., dermatoscopy [29], that could provide initial diagnosis, the pathological examination remains the gold standard for the final diagnosis. In addition, useful prognostic information in the clinical management of the patient can also be provided by the histological examination. The histopathology slides provide a cellular level view of the diseased cell and tissue and is considered the “gold standard” in the diagnosis of diseases for almost all kinds of cancer [12]. Traditionally, n Corresponding author. Tel./fax: þ 86 29 85310161. E-mail addresses: chenglu@snnu.edu.cn (C. Lu), mmandal@ualberta.ca (M. Mandal). http://dx.doi.org/10.1016/j.patcog.2015.02.023 0031-3203/& 2015 Elsevier Ltd. All rights reserved. the histopathology slides are examined under a microscope by pathologists. With the help of the whole slide histology digital scanners, glass slides of tissue specimen can now be digitized at high magnification to create the whole slide image (WSI) [23,32]. Such high resolution images are similar to what a pathologist observes under a microscope to diagnose the biopsy. The pathologists can now do the examination via the WSI instead of using the microscope. Note that the WSI takes a large storage space and huge computing power for processing. For example, a 20 mm2 glass slide tissue scanned with a resolution of 0:11625 μm=pixel (at 40 magnification) will consist of about 2.96 1010 pixels, and will approximately require 80 GB of storage space in uncompressed color format (24 bits/pixel). Therefore, it is time consuming and difficult to analyze a WSI manually. In addition, the manual diagnosis is subjective and often leads to intraobserver and inter-observer variability [9]. In order to address the above-mentioned problems, automated computational tools/technique which can provide reliable and reproducible objective results for quantitative analysis is needed. Recently, many researchers have applied sophisticated digital image analysis techniques to extract objective and accurate prognostic clues throughout the WSI. These computer-aided systems have shown promising initial results in the case of breast cancer [21,8], prostate cancer [4], head and neck cancer [18], cervical cancer [30], neuroblastoma [25,10], and ovarian cancer [26]. In this work, we propose an automatic analysis and diagnosis technique for the whole slide skin pathological image. The goal of C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750 the proposed technique is to provide quantitative measures which will help the pathologist for their diagnosis and classify the melanoma, melanotic nevus and normal skin biopsy automatically. This paper is organized as follows. Section 2 presents the related works on the automatic WSI analysis and skin tissue analysis. Section 3 describes the image data used in this paper. Section 4 presents the proposed technique that consists of five modules. Experimental results are presented in Section 4, followed by the conclusion in Section 5. 2. Related works 2.1. WSI analysis techniques The whole slide histopathological image analysis has become an attractive research topic in recent years [25,10,4]. The WSI is able to provide global information of a tissue specimen for quantitative image analysis. However, the automated WSI analysis is challenging since it has high computational complexity. Several techniques based on the multi-resolution framework have been proposed. Table 1 shows the state-of-the-art literature on the WSI analysis techniques. Petushi et al. [21] employed gray scale conversion, adaptive thresholding and morphological operations to segment the nuclei and identify the high nuclei density regions in the invasive breast carcinoma WSIs. Classification of the tissue type is then performed based on the features extracted from pre-segmented areas using different classifiers. A classification accuracy of 68% has been reported on a database of 24 WSIs. Mete et al. [18] proposed a block-based supervised technique for detection of malignant regions in histopathological head and neck slides. With the pre-defined training subimages (128 128 pixels), this technique first extracts the most prominent colors that are present in the positive and negative training samples. These colors are then clustered into several groups that are used to train the SVM. For malignancy detection in a candidate image, the color information is extracted and is classified by the pre-trained SVM. The technique provides a good performance. However, the performance may suffer from the staining color variations of the WSIs. Wang et al. [30] developed an automated computer-aided technique for the diagnosis of cervical intraepithelial neoplasia. This technique first segments the epithelium using prior knowledge of the tissue distribution. Based on the measurements and features obtained from the squamous epithelium, the SVM is employed to perform the classification. This CAD technique has been reported to achieve 94.25% accuracy for four classes tissue classification on 31 digital whole slides. Sertel et al. [25] developed a multi-scale CAD technique for classification of stromal development for neuroblastoma. This technique uses texture features extracted using co-occurrence statistics and local binary patterns. A modified K-NN classifier was employed to determine the confidence resolution for the classification. The experimental results showed an overall classification accuracy of 88.4%. Kong et al. [10] developed a similar CAD technique for classification of the grades of neuroblastic differentiation. This technique first segments a WSI into multiple cytological components (e.g., nuclei, cytoplasm, RBCs) at each resolution level using an Expectation-maximization approach. The cytological and statistical features derived from the presegmented results are then fed to a multi-classifier combiner for the training. The trained classification technique is tested on 33 WSI with 87.88% accuracy. Roullier et al. [24] proposed a multi-resolution graph-based analysis framework for the WSI analysis of breast cancer. The 2-means clustering is applied on the histogram of the regularized image to cluster the region of interest and background. Spatial refinement based on the discrete label regularization are used to achieve accurate segmentation around the boundary. The above steps are repeated at four different resolution images (from lower to higher). Finally, the mitosis are identified in the labeled region of interest with the domain knowledge that the mitosis are visually recognized by the red-cyan color difference. 2.2. Skin tissue analysis techniques Since different tissue types have different specific diagnostic features considered by the pathologists, the techniques mentioned in Section 2.1 could not be applied in the skin tissue analysis and diagnosis. Unlike other types of tissue specimen, a typical skin tissue slide consists of three main parts: epidermis, dermis and sebaceous tissues. The anatomy of a typical skin tissue is shown in Fig. 1, where the lower image shows the manually labeled contour of the epidermis. The epidermis area is an important observation area in skin melanoma diagnosis. The morphological features and distribution of the interested objects are most useful in diagnosis. Two examples of skin WSI are shown in Fig. 2. Fig. 2(a) shows an example of melanocytic nevus, where melanocytes are invading into the dermis area (the invading melanocytes appear as the dark blue area in dermis). Fig. 2(b) shows an example of superficial spreading melanoma, where the image looks like normal skin tissue unless the epidermis area is examined carefully. For automated diagnosis of the skin tissue, the existing techniques mentioned in Section 2.1 are not expected to get satisfied results since most of the existing techniques focus on color or texture features at pixel-level [18,25,10,30]. Table 1 Related works on the whole slide histopathology image analysis. Tissue type Reference Dataset info. Breast Head and neck Cervical Neuroblastoma Neuroblastoma Breast Skin Petushi et al. [21] Mete et al. [18] Wang et al. [30] Sertel et al. [25] Kong et al. [10] Roullier et al. [24] Smolle and Gerger [28] 24H&E stained, 20 7H&E stained, 20 31H&E stained, 40 43H&E stained, 40 33H&E stained, 40 H&E stained, 20 H&E stained, 10 2739 Fig. 1. The anatomy of a skin tissue. 2740 C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750 Fig. 2. Two examples of skin WSI. (a) An example of melanocytic nevus. (b) An example of superficial spreading melanoma. (For interpretation of the references to color in this figure, the reader is referred to the web version of this paper.) Fig. 3. The overall schematic of the proposed technique. A few research works have been reported on the HPI analysis of the skin cancer. Smolle [27] performed a pilot research on the digital skin histopathological image analysis. A selective portion of the whole skin tissue slide is first digitized into an image. This image of the skin specimens is then divided into non-overlap blocks of equal size and shape. A set of grey level, color and Haralick texture features [6] is calculated for each image blocks. Finally, based on the extracted features, multidimensional linear discriminant and hierarchical clustering are applied to classify the images blocks into two classes: normal and malignant. Smolle and Gerger [28] extended the block-based analysis schema where they applied tissue counter analysis (TCA) to the histologic sections of skin tumor. The basic idea of TCA is to “count” the classified image blocks in a given histological specimen. The histologic specimen is then be labeled as “benign” or “malignant”, based on the relative proportion of image blocks that belong to different classes. Experimental results showed that the TCA might be a useful method for the interpretation of melanocytic skin tumors. The above-mentioned skin tissue analysis techniques only reveal the feasibility of texture and color features and TCA for classification of different types of sampled skin area from the whole glass slide. There are several limitations of these skin tissue analysis techniques [27,28]. First, these techniques only consider the pixel-level features (i.e., texture and color features) of the sampled area. However, the object-level features, e.g., the distribution of melanocytes, are of interest to the pathologist. Second, these techniques may fail when there exists staining variation. Third, these techniques only consider a representative region within the whole glass slide, and therefore appropriate selection of representative region requires user interactions. 3. The proposed technique 3.1. The overview of the proposed technique The schematic of the proposed technique is shown in Fig. 3. The proposed technique consists of five modules. Given a WSI, the first three cascade modules are used for the segmentation and detection of ROIs. The epidermis area segmentation module aims to segment the skin epidermis in the WSI. The keratinocytes segmentation module is C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750 applied to segment the keratinocytes. Note that the keratinocytes are cells (which also include melanocytes) present in the epidermis area. The melanocytes detection module is used to detect the melanocytes from the pre-segmented keratinocytes. Based on the pre-segmented epidermis area, keratinocytes, and melanocytes, the features construction module will consider the spatial arrangement and morphological characteristics of the pre-segmented ROIs and construct the features according to the diagnostic factors used by pathologists. The last module, i.e., the classification module, utilizes the features to classify the tissues into three categories: melanoma, nevus, or normal skin. Note that before we processed the images, we perform a color normalization operation [19] for all images. 3.2. Epidermis area segmentation The epidermis area is the most important observation area for the diagnosis of a skin tissue. In most cases of melanoma, the grading of the cancer can be made by analyzing the architectural and morphological features of atypical cells in the epidermis or epidermis-dermis junctional area. Therefore, segmentation of the epidermis area is an important step before further analysis is performed. In one of our previous works [16], an efficient technique for automatic epidermis area segmentation and analysis has been proposed. This technique is adopted here for the epidermis segmentation and high resolution (HR) image tile generation. The schematic of the epidermis area segmentation technique is shown in Fig. 4 and explained as follows [16]: Down sampling: In order to reduce the processing time, the high resolution (HR) WSI is first down sampled into a low resolution (LR) WSI. Segmentation of epidermis: In this step, the red channel of the WSI is selected for the segmentation since it provides the most discrimination between the epidermis and other regions [16]. Otsu's threshold method [20] is then applied for the initial segmentation. In order to eliminate the false positive regions after the thresholding, two criteria based on the area size and shape are used to determine the real epidermis area. Finally, morphological operations (e.g., opening and closing) are applied to fill in the holes and smooth the boundary. An epidermis mask is thus obtained. Analysis of epidermis layout: Based on the epidermis mask, the epidermis layout is determined. The layout of epidermis is 2741 defined as horizontal or vertical. The entire epidermis mask is first divided into small grids. Within each grid, the percentage of bright pixels is counted. If the percentage of the bright pixels is greater than a threshold, this grid is assigned a value 1, otherwise 0. A straight line is then fitted through the grids with value 1 to determine the acute angle between the epidermis and the horizontal line. If the angle is greater than or equal to 451, the epidermis layout is vertical, otherwise horizontal. Generation of HR image tiles: The segmented epidermis in LR is first mapped to HR. The whole segmented epidermis area is then divided into several image tiles for further analysis. Based on the layout of the epidermis, the HR image tiles are generated across the epidermis area so that at each image tile all sublayers of the epidermis can be observed. If the layout is vertical (horizontal), we generate the image tiles by stitching the image blocks with segmented epidermis mask horizontally (vertically). An example is illustrated in Fig. 5. In Fig. 5, the binary mask represents the segmented epidermis area, where the white region indicates the epidermis area and the black region indicates the background and other regions. The layout of the epidermis is horizontal, and therefore the image tiles are generated vertically. The thick rectangles indicate the generated image tiles for further processing. Some of the snapshots of the HR image tiles are shown in Fig. 5. Three epidermis area segmentation results are shown in Fig. 6. In Fig. 6(a), (c), and (e), the original images with manually labeled epidermis area (shown as bright contour) are shown. Fig. 6(b), (d), and (f) shows the binary mask that represents the segmentation result. 3.3. Keratinocytes segmentation Based on the HR image tiles obtained from Section 3.2, it is now possible to segment the keratinocytes in the epidermis area. Considering the intensity variations, an adaptive threshold technique is applied to segment the nuclei regions [13]. This technique has two main steps which are described as follows. 3.3.1. Hybrid gray-scale morphological reconstruction (HGMR) In order to reduce the influence from undesirable variations in the image and to make the nuclei region homogeneous, a hybrid gray-scale morphological reconstruction method is applied. The steps of HGMR are described below: Fig. 4. The schematic for epidermis area segmentation module. 2742 C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750 Fig. 5. An example of generating the image tiles from the binary epidermis mask (shown in the center). Note that the layout of the epidermis is horizontal in this example, and hence the image tiles are generated vertically. The rectangles indicate the generated image tiles for further processing. Some of the snapshots of the image tiles are present. Fig. 6. Three examples of the epidermis segmentation results. (a), (c), and (e) show original images with manually labeled epidermis area (shown as bright contour). (b), (d), and (f) show the binary images that represent the segmentation result where the white region indicates the epidermis area, and the black region indicates the background and other regions. C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750 (a) Complement of the image: Assuming an 8-bit image R, the complement image R is calculated as follows: Rðx; yÞ ¼ 255 Rðx; yÞ ð1Þ where (x, y) is the coordinate. (b) Opening-by-reconstruction: In order to enhance the nuclei regions, the opening-by-reconstruction operation [5] is performed on the image R as follows: R obr ¼ RðR e ; RÞ ð2Þ where R is the morphological reconstruction operator [5], R e ¼ R⊖S (where ⊖ is the erosion operator), and S is the structuring element. The eroded image obtained by applying erosion in Fig. 7(b) is shown in Fig. 7(c). The result of the opening-byreconstruction is shown in Fig. 7(d). Comparing Fig. 7(d) and (b), it is observed that the nuclei regions have been enhanced. (c) Closing-by-reconstruction: In order to reduce the noise further, the closing-by-reconstruction is performed on R obr as follows: R obrcbr ¼ 255 Rðð255 R obr Þ⊖S; 255 R obr Þ ð3Þ The result of the closing-by-reconstruction is shown in Fig. 7(g). Note that the intensity within the nuclei regions is more homogeneous compared to that in Fig. 7(d). (d) Complement of the image: This step calculates the complement of R obrcbr in order to map the image into the original intensity space, i.e., R0 ðx; yÞ ¼ 255 R obrcbr ðx; yÞ. The final image is shown in Fig. 7(h). 3.3.2. Local region adaptive threshold selection For initial segmentation, a global threshold (Otsu's threshold method [20]) is first applied on the image. We have found that the touching of multiple nuclei often results in an abnormally large region (ALR), and degrades the segmentation performance. To address this, a local region adaptive threshold selection (LRATS) method has been proposed to achieve finer segmentation [13]. In LRATS, a local threshold is determined by minimizing a predefined cost function. Denote an ALR as L and the kth fragment obtained by a threshold t within an ALR as Lk(t), the cost C(t) is calculated for the current threshold t as follows: CðtÞ ¼ K 1X ½ΦE ðLk ðtÞÞ þ ΦA ðLk ðtÞÞ Kk¼1 ð4Þ 2743 where K is the total number of segmented fragments obtained by the current threshold t, and ΦE ðLk ðtÞÞ and ΦA ðLk ðtÞÞ are two penalty functions that correspond to the ellipticity and area of the regions, respectively. Intuitively, if the segmented fragments are close to elliptical shape and the segmented areas are within the predefined area range [Amin, Amax], the cost function C(t) will have a small value. An epidermis area is shown in Fig. 8(a), and the keratinocytes segmentation result is shown in Fig. 8(b). 3.4. Melanocytes detection In the skin melanoma diagnosis, the architectural and cellular features (e.g., size, distribution, location) of the melanocytes in the epidermis area are important factors. In this module, the melanocytes are detected from the pre-segmented keratinocytes. In the epidermis area of skin, a normal melanocyte is typically a cell with a dark nuclei, lying singly in the basal layer of epidermis. However, in the melanoma or nevus, the melanocytes grow abnormally, and can be found in the middle layer of epidermis. The digitized histopathological images we used in this work are stained with hematoxylin and eosin (H&E). Three examples of skin epidermis images are shown in Fig. 9(a), (b), and (c). The cell nuclei are observed as dark blue whereas the intra-cellular material and cytoplasm are observed as bright pink. Note that the bright seed points indicate the location of melanocytes whereas other nuclei are the other keratinocytes. It is observed that the differences between melanocytes and other keratinocytes are the surrounding region. The melanocytes generally have brighter halo-like surrounding regions and are retracted from other cells, due to the shrinkage of cytoplasm [31,14]. One closeup example of a melanocyte is shown in Fig. 9(d), where the outer dotted contour represents the halo region and the inner solid contour represents the nuclei. In contrast, the other keratinocytes are in close contact with the cytoplasm and have no or little brighter area. The brighter halo-like region of the melanocyte is an important pattern for differentiation of the melanocytes and other keratinocytes. In this module, we adopted one of our previous works for the melanocytes detection [15]. The schematic of the melanocyte detection technique is shown in Fig. 10. Based on the pre-segmented keratinocytes, the melanocyte detection technique first estimated the outer boundary of halo region using the radial line scanning (RLS) method. At first, the RLS method initializes the radial center and radial lines for each pre-segmented keratinocytes (see Fig. 11(c)). On each radial line, the gradient information is calculated and the outer boundary points of Fig. 7. Example of hybrid gray-scale morphological reconstruction. Keratinocytes segmentation result in epidermis area. (a) Original image, (b) Complement of the image, (c)Image after erosion, (d) Opening-by-Reconstruction, (e) Complement of the image, (f) Image after erosion, (g) closing-by-Reconstruction and (h) Final result. 2744 C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750 Fig. 8. An example of keratinocytes segmentation result in epidermis area: (a) shows a small portion of the epidermis in WSI, (b) shows the automatic segmented keratinocytes (indicated by the thick bright contours) within the epidermis area (indicated by the thick black line), and (c) shows the automatic segmented melanocytes (indicated by the thick bright contours). the halo region are estimated (the inner boundary is already obtained by the keratinocytes segmentation module). Intuitively, the point on a radial line whose gradient direction is going inwards is determined as the outer boundary point of the halo. Examples of the estimated halo regions are shown in Fig. 11(e). Once the halo region is estimated, the melanocytes are then determined based on the ratio of the estimated halo region and the original nuclei region. The ratio, RHN, is calculated as follows: RHN ¼ AHR ANR ð5Þ where AHR and ANR are the areas of the halo and nuclei regions, respectively. For the melanocyte detection, we select the nuclei regions as the melanocytes if its RHN value is greater than a predefined threshold T RHN . Examples of the value RHN are shown in Fig. 11(e). A melanocytes detection result is shown in Fig. 8(c). The spatial distribution of the melanocytes in epidermis area is an important diagnostic factor. In normal skin epidermis, the melanocytes are evenly distributed along the basal layer of the epidermis. The ratio between the melanocytes and other keratinocytes in skin tissues is between 1:10 and 1:20, respectively. In the case of melanotic nevus, there is a large number of melanocytes and these melanocytes may form several nests in the junction of epidermis and dermis area or penetrate into the dermis area. In the case of melanoma, the number of the melanocytes is large as well, but not as large as that in the case of nevus. However, the melanocytes may invade into the upper layers of the epidermis or invade deep into the dermis area. Based on the characteristics mentioned above, we utilize the number and location of the pre-segmented ROIs, i.e., keratinocytes, melanocytes to construct the spatial distribution features for further analysis. In order to calculate the spatial distribution, we first determine the thickness of the epidermis. The spatial distribution features are then generated. 3.5. Features construction: spatial distribution features In the previous modules, the epidermis area, the keratinocytes and the melanocytes in the epidermis area are segmented and detected by the automated techniques. These ROIs are now ready to be further analyzed by computer-aided techniques based on the pathologist's evaluation procedure. In the manual evaluation, several important factors are considered by a pathologist. In the proposed technique, these important factors are considered for the classification. 3.5.1. Local thickness measurement The thickness of epidermis in skin varies significantly, and therefore the thickness of epidermis needs to be measured locally. The measurement method is described below: Determine the boundary of epidermis base: At first, based on the layout of the epidermis area, horizontal or vertical, the outer C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750 2745 Fig. 9. Melanocytes in epidermis area from different skin samples. Inter- and intra-image variations are observed in terms of the color. These images are sampled from the skin WSI. In (a), (b), and (c), the bright seed points indicate the location of melanocytes whereas other nuclei are keratinocytes. (d) is a close up image of a melanocyte. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.) Fig. 10. The schematic of the melanocyte detection technique. boundary of epidermis is determined. Fig. 12(a) shows a horizontal layout skin tissue and Fig. 12(b) shows a zoomed version of local epidermis area. The boundary of epidermis base is highlighted in Fig. 12(b). Assign sampling points: On the pre-determined base boundary, we select a few points by fixed-interval sampling. Assuming that there are K points on the outer boundary B, and setting the interval to ε ¼5, we obtain a set of boundary points as follows: Λ ¼ fpi A Bj i ¼ ε j; j ¼ 1; …; ⌊K=εcg: ð6Þ Fig. 12(c) illustrates the sampling points (shown as bright dots) on the epidermis base boundary. Measure local thickness: Based on the sampling points on the base boundary, the thickness of the epidermis is measured. Given a sampling point pi A B, we obtain its neighboring sampling points pi 1 and pi þ 1 . A curve C is estimated based on the three points pi, pi 1 and pi þ 1 using a polynomial curve fitting method. A line Ti , which is tangent to the fitted curve at point pi, is then calculated. The line Ni normal to the tangent line Ti will reflect the orientation of local epidermis area. The local depth measure Di is calculated as the distance from the current sampling point pi A B along the estimated normal line Ni until it reaches the inner boundary of the epidermis (i.e., the dermis and epidermis junction). This is illustrated in Fig. 12(b). For each sampling point, the local thickness is measured. 3.5.2. Spatial distribution features Based on the thickness measured in the last section, we divide the epidermis into three sub-layers evenly. Note that in a normal skin, the melanocytes are typically present at the bottom layer of epidermis, near the junction of epidermis and dermis layer, known as basal layer. In the case of superficial spreading melanoma, the melanocytes invade into the middle layer of epidermis. Therefore, dividing the epidermis into three sub-layers has been found to be a good trade-off between simplicity and efficiency to quantify the melanocytes distribution in the epidermis area. Let these three sub-layers be denoted by Linner , Lmiddle , and Louter . This is illustrated in Fig. 12(c) and (d), where the boundary of the three layers is highlighted. The number of pre-segmented keratinocytes and melanocytes which are located within these three sub-layers is 2746 C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750 Fig. 11. Illustration of the RLS method: (a) is the original image with three nuclei regions where the melanocyte is indicates with letter ‘M’, (b) shows the segmented nuclei regions, (c) shows the radial lines for each nuclei region (nuclei regions are shown with thick contours), (d) shows the suppressed smoothed gradient map, and (e) shows the halo region estimation result. The value of the parameter RHN is shown inside the nuclei region. then calculated. The ratio of the melanocytes and keratinocytes within these sub-layers is calculated as a set of features for the classification. Denote N Kera and N Mela as the number of keratinocytes and melanocytes in epidermis area, respectively. The ratio of the number of the keratinocytes and melanocytes in inner, middle, outer layers is calculated as follows: L L inner inner inner ¼ N Mela =N Kera ; RLMK L L ð7Þ middle middle middle ¼ N Mela =N Kera ; RLMK ð8Þ outer outer outer RLMK ¼ N LMela =N LKera : ð9Þ 3.6. Features construction: morphological features of melanocytes Besides the spatial distribution of the pre-segmented melanocytes, the morphological features are also important features for the classification. The morphological features will reflect the grading of the skin tissue under examination. The five morphological features used in this paper are presented below: Area: The area of the detected nuclei. Perimeter: The perimeter of the detected nuclei. Eccentricity: The eccentricity is the ratio of the distance between the foci of the best fit ellipse and the major axis length. Equivalent diameter: Measure the diameter of apcircle that has ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi the same area of current measured region, i.e., 4 Area=π . Ellipticity: The ratio of the major axis and the minor axis of the best fit ellipse. For each feature, we calculate the mean value and the SD value for the pre-segmented melanocytes in the epidermis area. This results in 10 morphological features. In total, a 13 dimensional feature vector F (three spatial distribution features and ten morphological features) is constructed for each skin WSI. 3.7. Classification After the feature construction, a classification technique is applied to classify the WSI into three classes. In this work, support vector machine (SVM) is used for the classification. While the SVM is originally proposed for two-class classification, it has been extended for the multi-class classification [11,3,7]. The “one-against-one” multi-class SVM method has been shown to provide better performance than other multi-class SVM methods [7]. In “one-against-one” multi-class SVM method, the multi-class classification problem is decomposed into nðn 1Þ=2 two-class classification problems, where n is the number of classes. These nðn 1Þ=2 two-class SVMs are trained using the samples from the corresponding paired classes. The hyperplane to separate the ith class and the jth class can be expressed as ðwij ÞT ϕðF Þ þ b ¼ 0 ij ð10Þ where w and b are the coefficients for the hyperplane, and ϕðÞ is the mapping function. In the testing phase, the testing sample will be labeled by all the nðn 1Þ=2 two-class SVMs. If the current SVM labels the sample as the ith class, then the votes of the ith class are increased by 1. After the testing sample is labeled by all nðn 1Þ=2 two-class SVMs, the accumulated votes are calculated. The class label that has the ij ij C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750 2747 Fig. 12. Epidermis depth measurement, and construction of sub-layers for the epidermis. Table 2 Description of the image dataset. Table 3 Performance evaluations. Class No. Percentage (%) Superficial spreading melanoma Melanocytic nevus Normal skin 32 17 17 48.48 25.76 25.76 Total 66 100 Techniques Classification accuracy (%) TCA technique Proposed technique with all (13) features Proposed technique with only distribution-based (3) features Proposed technique with only morphological (10) features 70 89.0772.72 77.41 71.19 83.217 2.29 maximum votes will be assigned to the testing sample. In this paper, the SVM with linear kernel is utilized for the classification. 4. Experimental results superficial spreading melanoma, melanocytic nevus, and normal skin. Table 2 shows the distribution of these three classes. The digital images were captured under 40 magnification on Carl Zeiss MIRAX MIDI Scanning technique (Carl Zeiss Inc., Germany). 4.1. Image dataset 4.2. Classification results In this study, we have evaluated the proposed technique on 66 different whole slide cutaneous histopathological images. The histological sections used for image acquisition are prepared from formalin-fixed paraffin-embedded tissue blocks of skin biopsies. The sections prepared are about 4 μm thick and are stained with H&E using automated stainer. The skin samples used includes We use the 10-fold cross validation [2] for the evaluation. In other words, we first divide the whole dataset into 10 subsets. We then use 9 subsets as the training set and the remaining one subset as the testing set, and we obtain the performance. We repeat this procedure for 10 times with different testing sets. We also repeat 2748 C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750 Fig. 13. An example of misclassified skin tissue: (a) a snapshot of a skin WSI and (b) a zoom in image of the rectangle area in (a). Table 4 Top-8 important features. Index 1 2 3 4 5 6 7 8 Feature name Classification accuracy (%) Eccentricity-Std 66.77 7 3.38 64.64 7 3.73 inner RLMK Perimeter-Mean outer RLMK Equivalent Diameter-Mean Area-Mean middle RLMK Ellipticity-Mean 59.687 4.41 59.54 7 6.04 59.45 7 5.31 57.62 7 6.16 55.85 7 2.87 54.22 7 6.87 the 10-fold cross validation for 100 times, the average performance is used as the final performance for the classification. It is observed that the proposed technique is able to classify most of the skin tissues correctly by using all features. If only the distribution-based features are used, the classification accuracy goes down to about 78%. It is noted that if only the morphological features are used, about 84% accuracy is achieved. Since, to the author's knowledge, there is no existing technique for skin WSI classification in the literature, we have adopted and implemented the TCA technique [28] for comparison. The classification accuracy of the TCA technique is shown in the first row of Table 3. In the implementation of the TCA technique, we randomly selected 12 WSIs for the training phase and selected the remaining WSIs for the testing phase. In the testing phase, the WSI is first divided into small image blocks. In each WSI, the background blocks and the tissue blocks are manually labeled. Within the tissue blocks, the blocks containing cells (denoted as cell block) and the blocks containing other cytological components are manually labeled. Within the cell blocks, the blocks containing malignant cells and the blocks containing benign cells are manually labeled. The features extracted from these three paired of blocks (i.e., background and tissue blocks, cell blocks and other cytological components blocks, malignant cells and benign cells blocks) are then used to train three classifiers. In the testing phase, the WSI is divided into small image blocks. Each block is being classified by the pre-trained classifiers. The percentages of the malignant cell blocks and benign cell blocks with respect to the total number of blocks are then calculated. If the percentage of the malignant cells blocks is dominant (greater than 50%), a WSI is determined as a melanoma, otherwise normal or nevus tissue. Note that in the implementation of the TCA technique, we consider a two-class classification problem, i.e., the malignant tissue (melanoma) versus benign tissue (nevus and normal skin). It is observed in Table 3 that the classification performance of the TCA technique is 70%, which is lower than the proposed technique. One reason behind this performance is that the TCA technique only considers the pixel-level features, the staining will degrade the classification performance. However, the proposed technique considers the object-level features which are less affected by the staining variations. Another drawback of the TCA technique is that it requires user interactions for the training blocks selection whereas the proposed technique requires less user interactions. An example of the misclassification by the proposed technique is shown in Fig. 13. This skin tissue belongs to the melanoma class (in-transit metastasis type). The proposed technique misclassified this skin tissue as the normal class. This is because this melanoma skin tissue is in-transit metastasis, and hence the spatial distribution and the morphological features in the WSI (in the epidermis) look very similar to that of the normal skin. Therefore, the proposed technique misclassifies this melanoma tissue as a normal class. Incorporating additional features from dermis area is required to classify it as melanoma. 4.3. Importance of features We evaluated the importance of each individual feature by evaluating their contribution to the classification. Instead of using all 13 features for the classification, we used single feature out of 13 features iteratively for the classification, the corresponding classification results are then recorded. Table 4 shows the top-8 important features according to the classification accuracy. 4.4. Parameter selections Several parameters are used in the proposed framework. In the keratinocytes segmentation module, there is a pair of parameters named predefined area range [Amin, Amax]. These parameters are used for calculating the cost function C(t) , in Eq. (4), to find optimum threshold for keratinocytes segmentation. The area range [Amin, Amax] aims to define a possible size range of keratinocytes in the image. If the range has great discrepancy with the real situation, the keratinocytes segmentation result will be degraded and the final classification result will not be accurate. For example, under a certain magnification, a typical keratinocyte approximately contains 300 pixels and an atypia keratinocyte contains 400 pixels. However, if the range is set to [100, 800], larger segment fragments will be C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750 2749 Fig. 14. Evaluation on parameter selection: (a) parameter Amax in [200,1000] and the corresponding classification accuracy, and (b) parameter T RNH in [0.1,1.3] and the corresponding classification accuracy. included in the final segmentation result, and will lead to oversegmentation (many nuclei will be treated as one object). This will affect the spatial distribution features, described in Section 3.5.2, where the counting of the keratinocytes is used. The area range [Amin, Amax] can easily be estimated using a few sample images with keratinocytes. In the proposed framework, [Amin, Amax] is set to [100, 500]. In order to evaluate the effect on tuning the area range [Amin, Amax], we set Amin ¼100, and we tune the Amax from 200 to 1000 with a step of 50. The corresponding classification accuracy is shown in Fig. 14(a). It is observed that the classification accuracy is above 85% when the Amax is around 300–700, whereas the classification performance decreases if Amax is larger than 700. In the melanocytes detection module, there is a parameter named T RHN , in Section 3.4, that determines whether a keratinocyte is a melanocyte or another cell type. In the melanocytes detection module, ratio between the areas of the halo and nuclei regions, RHN, is used as a descriptor to differentiate the melanocytes and other types of cells. Thus the threshold, T RHN , is important for getting high detection accuracy rate. Similar to the parameter [Amin, Amax] in the keratinocyte segmentation module, the threshold T RHN will affect the spatial distribution features, described in Section 3.5.2, where the counting of the melanocytes is used. The parameter T RHN can be determined by using a few sample images with epidermis that contains melanocytes and keratinocytes. In the proposed framework, T RHN is set to 0.8. In order to evaluate the effect on tuning the T RHN , we tune the T RHN from 0.1 to 1.3 with a step of 0.1, and the classification accuracy is evaluated. Fig. 14(b) shows the evaluation result. It is observed that the classification accuracy is above 85% when the T RHN is around 0.4–1. 4.5. Computational complexity evaluation All experiments were carried out on a 2.4-GHz Intel Core II 597 Duo CPU with 3-GB RAM using MATLAB 7.04. On average, the proposed technique takes about 36 min to finish the analysis on a 12,000 10,000 color WSI, which contains about 2300 nuclei and 350 melanocytes in the epidermis area. The epidermis segmentation module takes 0.5% of the whole processing time. The keratinocytes segmentation module takes 2.6% of the whole processing time. The melanocytes detection module takes 94.4% of the whole processing time. The feature construction module and the classification module take 2.5% of the whole processing time. 5. Conclusion In this paper, we present an automatic analysis and classification technique for the melanotic skin WSI. The epidermis area, keratinocytes, and melanocytes are first segmented by automatic techniques. The spatial distribution and morphological features based on the pre-segmented ROIs are then extracted, and classified. The proposed technique provides about 90% accuracy on the classification of melanotic nevus, melanoma, and normal skin. In future, features in dermis area will be analyzed in order to improve the classification performance. Conflict of interest None declared. Acknowledgment The authors would like to acknowledge the funding of the National Natural Science Foundation of China (Grant no. 61401263) and Fundamental Research Funds for the Central Universities of China (Grant no. GK201402037). We thank Dr. Naresh Jha, and Dr. Muhammad Mahmood of the University of Alberta Hospital for providing the images and giving helpful advice. References [1] American Cancer Society, What are the Key Statistics About Melanoma? Technical Report, American Cancer Society, 2008. [2] C. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, Oxford, UK, 1995. [3] K. Crammer, Y. Singer, On the learnability and design of output codes for multiclass problems, Mach. Learn. 47 (2) (2002) 201–233. [4] S. Doyle, M. Feldman, J. Tomaszewski, A. Madabhushi, A boosted Bayesian multi-resolution classifier for prostate cancer detection from digitized needle biopsies. IEEE Trans. Biomed. Eng. 59(5) (2012) 1205–1218. [5] R. Gonzalez, R. Woods, Digital Image Processing, 2002. [6] R. Haralick, K. Shanmugam, I. Dinstein, Textural features for image classification, IEEE Trans. Syst. Man Cybern. 3 (6) (1973) 610–621. [7] C. Hsu, C. Lin, A comparison of methods for multiclass support vector machines, IEEE Trans. Neural Netw. 13 (2) (2002) 415–425. [8] Huang C., Veillard A., Roux L., Lomenie N., Racoceanu D., Time-efficient sparse analysis of histopathological whole slide images, Comput. Med. Imaging Graph. 35 (7) (2010) 579–591. [9] S. Ismail, A. Colclough, J. Dinnen, D. Eakins, D. Evans, E. Gradwell, J. O'Sullivan, J. Summerell, R. Newcombe, Observer variation in histopathological diagnosis and grading of cervical intraepithelial neoplasia, Br. Med. J. 298 (6675) (1989) 707. [10] J. Kong, O. Sertel, H. Shimada, K.L. Boyer, J.H. Saltz, M.N. Gurcan, Computer-aided evaluation of neuroblastoma on whole-slide histology images: classifying grade of neuroblastic differentiation, Pattern Recognit. 42 (6) (2009) 1080–1092. [11] U. Kreßel, Pairwise classification and support vector machines, in: Advances in Kernel Methods, MIT Press, Cambridge, MA, 1999, pp. 255–268. [12] V. Kumar, A. Abbas, N. Fausto, et al., Robbins and Cotran Pathologic Basis of Disease, Elsevier Saunders, Philadelphia, 2005. [13] C. Lu, M. Mahmood, M. Jha, A robust automatic nuclei segmentation technique for quantitative histopathological image analysis, Anal. Quant. Cytol. Histopathol. 12 (2012) 296–308. [14] C. Lu, M. Mahmood, N. Jha, M. Mandal, Automated segmentation of the melanocytes in skin histopathological images, IEEE J. Biomed. Health Informatics 17 (2) (2013) 284–296. 2750 C. Lu, M. Mandal / Pattern Recognition 48 (2015) 2738–2750 [15] C. Lu, M. Mahmood, N. Jha, M. Mandal, Detection of melanocytes in skin histopathological images using radial line scanning, Pattern Recognit. 46 (2) (2013) 509–518. [16] C. Lu, M. Mandal, Automated segmentation and analysis of the epidermis area in skin histopathological images, in: 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2012, pp. 5355–5359. [17] I. Maglogiannis, C. Doukas, Overview of advanced computer vision systems for skin lesions characterization, IEEE Trans. Inf. Technol. Biomed. 13 (5) (2009) 721–733. [18] M. Mete, X. Xu, C.-Y. Fan, G. Shafirstein, Automatic delineation of malignancy in histopathological head and neck slides, BMC Bioinform. 8 (Suppl 7) (2007) S17, doi:http://dx.doi.org/10.1186/1471-2105-8-S7-S17. [19] M. Niethammer, D. Borland, J. Marron, J. Woosley, N.E. Thomas, Appearance normalization of histology slides, in: Machine Learning in Medical Imaging, Springer, Berlin, Heidelberg, 2010, pp. 58–66. [20] N. Otsu, A threshold selection method from gray-level histograms, IEEE Trans. Syst. Man Cybern. 9 (1) (1979) 62–66. [21] S. Petushi, F.U. Garcia, M.M. Haber, C. Katsinis, A. Tozeren, Large-scale computations on histology images reveal grade-differentiating parameters for breast cancer, BMC Med. Imaging 6 (2006) 14. [22] D. Rigel, J. Russak, R. Friedman, The evolution of melanoma diagnosis: 25 years beyond the ABCDs, CA Cancer J. Clin. 60 (5) (2010) 301–316. [23] M. Rojo, G. García, C. Mateos, J. García, M. Vicente, Critical comparison of 31 commercially available digital slide systems in pathology, Int. J. Surg. Pathol. 14 (4) (2006) 285–305. [24] V. Roullier, O. Lezoray, V. Ta, A. Elmoataz, Multi-resolution graph-based analysis of histopathological whole slide images: application to mitotic cell extraction and visualization. Comput. Med. Imaging Graph. (2011). [25] O. Sertel, J. Kong, H. Shimada, U.V. Catalyurek, J.H. Saltz, M.N. Gurcan, Computer-aided prognosis of neuroblastoma on whole-slide images: Classification of stromal development, Pattern Recognit. 42 (6) (2009) 1093–1103. [26] N. Signolle, M. Revenu, B. Plancoulaine, P. Herlin, Wavelet-based multiscale texture segmentation: application to stromal compartment characterization on virtual slides, Signal Process. 90 (8) (2010) 2412–2422. [27] J. Smolle, Computer recognition of skin structures using discriminant and cluster analysis, Skin Res. Technol. 6 (2) (2000) 58–63. [28] J. Smolle, A. Gerger, Tissue counter analysis of tissue components in skin biopsies—evaluation using cart (classification and regression trees), Am. J. Dermatopathol. 25 (3) (2003) 215–222. [29] M. Vestergaard, P. Macaskill, P. Holt, S. Menzies, Dermoscopy compared with naked eye examination for the diagnosis of primary melanoma: a metaanalysis of studies performed in a clinical setting, Br. J. Dermatol. 159 (3) (2008) 669–676. [30] Y. Wang, D. Crookes, O.S. Eldin, S. Wang, P. Hamilton, J. Diamond, Assisted diagnosis of cervical intraepithelial neoplasia (cin), IEEE J. Sel. Top. Signal Process. 3 (1) (2009) 112–121. [31] D. Weedon, G. Strutton, Skin Pathology, vol. 430, Churchill Livingstone, New York, 2002. [32] R. Weinstein, A. Graham, L. Richter, G. Barker, E. Krupinski, A. Lopez, K. Erps, A. Bhattacharyya, Y. Yagi, J. Gilbertson, Overview of telepathology, virtual microscopy, and whole slide imaging: prospects for the future, Hum. Pathol. 40 (8) (2009) 1057. Cheng Lu received his Ph.D. degree in electrical and computer engineering from University of Alberta, in 2013. He is now working in Shaanxi Normal University, China. He is the recipient of Graduate Student Interdisciplinary Research Award 2012, University of Alberta. His research interests include medical image analysis, pattern recognition and computer vision. He is an author or coauthor of more than ten papers in leading international journals and conferences. Mrinal Mandal is a Full Professor and an Associate Chair in the Department of Electrical and Computer Engineering and is the Director of the Multimedia Computing and Communications Laboratory at the University of Alberta, Edmonton, Canada. He has authored the book Multimedia Signals and Systems (Kluwer Academic), and coauthored the book Continuous and Discrete Time Signals and Systems (Cambridge University Press). His current research interests include Multimedia, Image and Video Processing, Multimedia Communications, Medical Image Analysis. He has published over 140 papers in refereed journals and conferences, and has a US patent on lifting wavelet transform architecture. He has been the Principal Investigator of projects funded by Canadian Networks of Centers of Excellence such as CITR and MICRONET, and is currently the Principal Investigator of a project funded by the NSERC. He was a recipient of Canadian Commonwealth Fellowship from 1993 to 1998, and Humboldt Research Fellowship from 2005 to 2006 at Technical University of Berlin.