CSC 400 - Graduate Problem Seminar - Project Report A Deep Learning Approach to Universal Skin Disease Classification Haofu Liao University of Rochester Department of Computer Science haofu.liao@rochester.edu Abstract Skin diseases are very common in people’s daily life. Each year, millions of people in American are affected by all kinds of skin disorders. Diagnosis of skin diseases sometimes requires a high-level of expertise due to the variety of their visual aspects. As human judgment are often subjective and hardly reproducible, to achieve a more objective and reliable diagnosis, a computer aided diagnostic system should be considered. In this paper, we investigate the feasibility of constructing a universal skin disease diagnosis system using deep convolutional neural network (CNN). We train the CNN architecture using the 23,000 skin disease images from the Dermnet dataset and test its performance with both the Dermnet and OLE, another skin disease dataset, images. Our system can achieve as high as 73.1% Top-1 accuracy and 91.0% Top-5 accuracy when testing on the Dermnet dataset. For the test on the OLE dataset, Top-1 and Top-5 accuracies are 31.1% and 69.5%. We show that these accuracies can be further improved if more training images are used. 1. Introduction Skin diseases are one of the most commonly seen infections among people. Due to the disfigurement and associated hardships, skin disorders cause lots of trouble to the sufferers [13]. Speaking of skin cancer, the facts and figures become more serious. In United States, skin cancer is the most common form of cancer. According to a 2012 statistics study, over 5.4 million cases of nonmelanoma skin cancer, including basal cell carcinoma and squamous cell carcinoma, are treated among more than 3.3 million people in America [20]. In each year, the number of new cases of skin cancer is more than the number of the new incidence of cancers of the breast, prostate, lung and colon in combined [24]. Research also shows that in the course of a lifetime, one-fifth of Americans will develop a skin cancer [19]. However, the diagnosis of skin disease is challenging. To diagnose a skin disease, a variety of visual clues may be used such as the individual lesional morphology, the body site distribution, color, scaling and arrangement of lesions. When the individual components are analyzed separately, the recognition process can be quite complex [6, 15]. For example, the well studied skin cancer, melanoma, has four four major clinical diagnosis methods: ABCD rules, pattern analysis, Menzies method and 7-Point Checklist. To use these methods and achieve a good diagnostic accuracy, a high level of expertise is required as the differentiation of skin lesions need a great deal of experience [30]. Unlike the diagnosis by human experts which depends a lot on subjective judgment and is hardly reproducible [16, 17, 26], a computer aided diagnostic system is more objective and reliable. By using well-crafted feature extraction algorithms and combining with some popular classifiers (e.g. SVM and ANN), current state of art computer aided diagnostic systems [2, 31, 11, 22, 3] can achieve very good performance on certain skin cancers such as melanoma. But they are unable to perform diagnosis over broader classes of skin diseases. Human engineered feature extraction is not suitable for an universal skin disease classification system. On one hand, hand-crafted features are usually dedicated for one or limited number of skin diseases. They can hardly be applied to other classes and datasets. One the other hand, due to the diversity nature of skin diseases [6], human engineering for every skin disease is unrealistic. One way to solve this problem is to use feature learning [4] which eliminates the need for feature engineering and lets the machine to decide which feature to use. Many feature learning based classification systems have been proposed in past few years [5, 8, 7, 28, 29, 1]. However, they have been mostly restricted to dermoscopy or histopathology images. And they mainly focus on the detection of mitosis, an indicator of cancer [28]. In recent years, deep convolutional neural networks (CNN) become very popular in feature learning and object classification. The use of high performance GPU makes it possible to train a network on a large-scale dataset so as to yield a better performance. Many researches [9, 23, 12, 27, 25] from the ImageNet Large Scale VisualRecognition Challenge (ILSVRC) [21] show that the stateof-art CNN architectures are able to surpass human in object classification. Recently, Esteva et al. [10] proposed a CNN-based universal skin disease classification system. They train their network by fine-tuning the VGG16 and VGG19 architecture [25]. Their network achieved 60.0% Top-1 classification and 80.3% Top-3 classification which significantly outperformed the human specialists in their experiment. Inspired by [10], we conducted a research in a similar path and achieved a better accuracy. We use the skin images from two different sources. First, we collect images from Dermnet (www.dermnet.com), a publicly available dataset of more than 23000 dermatologist-curated skin disease images. Second, we also obtain 1300 skin images from New York State Department of Health. We call this dataset OLE. We fine tune our CNNs with the VGG16, VGG19 and GoogleNet [27] models and test them on the two datasets. The result shows that our CNNs can achieve 73.1% Top-1 classification and 91.0% Top-5 classification on the Dermnet dataset and 31.1% Top-1 classification and 69.5% Top-5 classification on the OLE dataset. The rest of this report is organized as follows. Section 2 introduces the dataset and CNN models we use to construct the CNN architecture. Section 3 investigates the performance of the CNNs using different training and test data settings. Conclusions and future works of this project is given in Section 4. (a) Dermnet 250 Class size 200 150 100 50 0 0 1 2 5 10 11 13 Class label 14 16 18 19 22 (b) OLE Figure 1: (a) The number of the Dermnet images in each of the 23 top-level classes. (b) The number of the OLE images in each of the skin disease in Table 2. The x-axis denotes the label of the classes. Each of the label is associated with a skin disease class in Table 1. The bars in red are the classes we will delete in Section 3.2. skin disease images from Dermnet. Since there are no direct link or API for these images, we download these images by parsing their address and send HTTP request to the web server. The downloaded images are not well labeled and contain watermarks. Their naming format are not consistent. Therefore, multiple name analysis strategies are used to extract class information from image names. Further, a two-level hierarchical map is established using the extracted class information and the Dermnet taxonomy. All the images from Dermnet are originally labeled using the bottom-level classes. Since we only consider the 23 toplevel classes, we merge each of these bottom-level classes according to the two-level hierarchical map and label each image with the 23 top-level classes. During the construction of the dataset, if the names of some images contain no class information or the number of the images of some bottomlevel class is small, we will discard those images. Figure 1 (a) shows the numbers of images in each of the 23 top-level skin disease classes. For each images in the OLE dataset, we first parse their file names to get their corresponding skin diseases. Then, we label them using the disease-label map in Table 2. The number of images of each skin disease in OLE dataset are given in Figure 1 (b). 2. Method 2.1. Dataset We build our skin disease dataset from two different sources: Dermnet and OLE. Dermnet is one of the largest photo dermatology source that available publicly. It has more than 23,000 skin disease images on a wide variety of skin conditions. Dermnet organizes the skin diseases biologically in a two-level taxonomy. The bottom-level contains more than 600 skin diseases in a fine-grained granularity. The top-level contains 23 skin disease classes as shown in Table 1. Each of the top-level skin disease class contains a subcollection of the bottom-level skin diseases. We adapt the skin disease taxonomy from Dermnet for our classification system and use the 23 top-level skin disease classes to label all the skin disease images. OLE dataset contains more than 1300 skin disease images from New York State Department of Health. It contains 19 skin diseases each can map to one of the bottom-level skin diseases from the Dermnet taxonomy. Hence, we further label these 19 skin diseases with their top-level counterparts in the Dermnet taxonomy. The labeling results are shown in Table 2. To prepare the dataset, we need to download the 23,000 2.2. CNN Architecture It has been shown that in many cases transfer learning can be used to efficiently train a deep CNN [18, 32]. In transfer learning, instead of training the network from randomly initialized parameters, people takes a pretrained net2 0. 1. 2. 3. 4. 5. Acne and Rosacea Malignant Lesions Atopic Dermatitis Bullous Disease Bacterial Infections Eczema Top-level Skin Disease Categories From Dermnet 6. Exanthems & Drug Eruptions 12. Nail Diseases 7. Hair Diseases 13. Contact Dermatitis 8. STDs 14. Psoriasis & Lichen Planus 9. Pigmentation Disorders 15. Infestations & Bites 10. Connective Tissue diseases 16. Benign Tumors 11. Melanoma, Nevi & Moles 17. Systemic Disease 18. 19. 20. 21. 22. Fungal Infections Urticaria Vascular Tumors Vasculitis Viral Infections Table 1: The 23 top-level categories of the Dermnet taxonomy. We use these top-level categories to label images from Dermnet and OLE. Each of the category is assigned to a numeric label ranges from 0 to 22. Due to the layout limitation, long category names are shortened. Subclass Names Rosacea Actinic Keratosissis Basal Cell Carcinoma Squamous Cell Carcinoma Atopic Dermatitis Verruca Nummular Eczema Lupus Erythematosus Melanoma Melanocytic Nevus Contact Dermatitis Lichen Planus Pityriasis Rosea Psoriasis Seborrheic Keratosis Tinea Corporis Tinea Versicolor Urticaria Herpes Labels 0 disease datasets and conduct all the trainings and tests on an NVIDIA Titan Black GPU to accelerate the computation. Fine-tuning with Caffe requires modifying the network definition (the deploy protocol or the train validation protocol) of the pretrained models. First, since we need to train the network using our skin disease datasets, the data layer of the pretrained network should be reinitialized. The following is an example of the data layer we used for our datasets. 1 2 5 10 1 11 2 13 4 3 5 6 14 7 8 16 9 10 18 11 19 22 13 12 14 15 16 Table 2: The 19 skin diseases from the OLE dataset. We label them according to the Dermnet taxonomy. 17 18 19 20 layer { name : ” d a t a ” t y p e : ” ImageData ” top : ” data ” top : ” l a b e l s ” include { p h a s e : TRAIN } transform param { mirror : true c r o p s i z e : 224 m e a n f i l e : ” imagenet mean . b i n a r y p r o t o ” } image data param { source : ” t r a i n . t x t ” b a t c h s i z e : 32 n e w h e i g h t : 256 n e w w i d t h : 256 } } In the data layer, we specify the input dataset using the “source” parameter which is the name of a text file with each line giving an image filename and a label. We set the input image size and the number of images to process at a time using the “new height”, “new width”, and “batch size” parameters. We also define the preprocessing to the input images using “mirror”, “crop size” and “mean file” parameters. They are set the same values as the pretrained models so that the input to the fine-tuned network is at the same scale as the pretrained network. For the last fully connected layer (the one that outputs scores of the skin disease classes), we change the output number to 23 for Dermnet’s 23 top-level classes. Hence, it will output scores to indicate the classification of the input skin disease image. We want this layer to train from scratch so that its weights can fit our dataset instead of the pretrained model. work and fine-tunes its weights by continuing the backpropagation. This works for the reason that the output of the early layers of a well-trained network usually contains some generic features. Those generic features such as edges, blobs can be very useful in many tasks. For a new dataset, those features can be applied directly. In our approach, we do transfer learning by fine-tuning ImageNet [21] pretrained models with Caffe [14], a deep learning framework that supports expressive and efficient deep CNN training. We choose VGG16, VGG19 and GoogleNet as our pretrained models. All of them are trained based on the ImageNet dataset and proved to have very good performance in the ILSVRC-2014 competition. VGG16 and VGG19 placed 1st in the task 2a challenge of the ILSVRC-2014 competition and GoogleNet is the winner of the task 1b challenge. In our experiments, we fine-tune these models with our skin 3 So, we change its name to randomize its weights and train it with a higher learning rate to get a faster convergence. For deploy protocols, we also replace the softmax layer by the softmaxwithloss layer so that the loss function can be applied to the training. indicates that they don’t get high error rate on some particular classes and In most of the times, the predictions are correct. CNN Model VGG19 VGG19 Improved 3. Experiments 3.1. Test on Dermnet Dataset Top-1 Accuracy 72.7% 73.1% 71.8% Top-5 Accuracy 61.7% 69.5% Table 4: Top-1 and Top-5 accuracies of the CNNs using the VGG19 model only. All the CNNs are tested on the OLE images. For VGG19 Improved, the refined dataset is used. In our first experiment, we train and test the CNNs using the Dermnet dataset only. We label all the Dermnet images using the labeling strategies introduced in Section 2.1. After construction, we get a set of 17630 labeled images. We randomly pick 16630 of them as the training set and 1000 of them as the test set. Then, we fine-tune three CNNs using the three ImageNet pretrained models (VGG16, VGG19 and GoogleNet) respectively. The Top-1 and Top-5 accuracies 1 of the networks are given in Table 3. We can see that all of the three networks achieved good classification results. VGG19 achieved relatively better performance on Top-1 accuracy. It is probably because that it has more layers (19) than other models. CNN Model VGG 16 VGG 19 GoogleNet Top-1 Accuracy 24.8% 31.1% 3.2. Test on OLE Dataset We then investigate the performance of the CNN on the OLE dataset. Our training set is the 17630 skin disease images from Dermnet. But for the test set, we choose the OLE images. Note that in this case, our training set has 1000 more Dermnet images than the last experiment. We need to retrain the CNN. Since all the three models show a similar performance and the VGG19 model performs slightly better, we fine-tune the CNN with the VGG19 model only. The Top-1 and Top-5 accuracies are shown in Table 4. The new CNN only yields 24.8% Top-1 accuracy and 61.7% Top-5 accuracy on OLE dataset. This is reasonable as the OLE dataset may have some skin disease images that the CNN can’t learn from the Dermnet dataset. Figure 6 (a) shows the confusion matrix of the CNN on the OLE dataset. As we’ve shown in Table 2, the OLE dataset only contains a subset of the 23 top-level classes. Hence, lots of the rows are marked with zeros. Consistent with the accuracy, diagonal elements in the confusion matrix show little confidence. Based on the observation above, we further analyze the CNN by selecting some test images from different skin disease classes and retrieving their nearest neighbors in the training set. We choose the output of the “fc7” layer as the feature vector. The reason we choose this layer against other layers is because the “fc7” is the last layer before the final output layer (the layer that outputs class scores) and should contain more specific details of the classes in the Dermnet dataset. Besides, the dimension of the feature vector is 4096 which can hold a great deal of information of the input image. To get the nearest neighbors, we first build a feature database for all the skin disease images in the training set. Then, for a given test image, we obtain its feature vector from the “fc7” layer and calculate the Euclidean distance between this feature vector and all the other features in the feature database. The five training images with the top-5 smallest distances to the input test image are chosen to be the nearest neighbors. Figure 4 and 5 are the image retrieval results of two test images from the OLE dataset. For each of the figure, the top left image is the test image and the rest are the retrieved neighbors. Figure 4 uses a tinea versicolar Top-5 Accuracy 91.0% 90.9% 90.7% Table 3: Top-1 and Top-5 accuracies of the CNNs using different ImageNet pretrained models. All the CNNs are trained using the Dermnet images only. Some example images along with their predictions are given in Figure 2. The ground truth is given at the top of each image and the Top-5 predictions together with the probabilities are given below. We can see melanoma, psoriasis, basal cell carcinoma, and systemic disease are predicted correctly with high confidence. The bullous disease is misclassified with divergent predictions. Viral infections, and fungal infections only hit Top-5 predictions with very low confidence. Benign tumors even gets misclassified with a high probability prediction on cellulitis. We will further analyze the the misclassified situations Section 3.2. The confusion matrices of the three networks are given in Figure 3. The index of each row in the confusion matrix corresponds to a true label and the indices of the columns denote the predicted labels. The color of each cell represents the probability of the prediction and the number appears on each cell gives the occurrences of the prediction. We can see from the three confusion matrices that the high confidence predictions are located along the diagonal. This 1 The percentage of tests that the ground truth matches the Top-1(Top-5) prediction(s). 4 Melanoma Melanoma, Nevi & Moles 0.999959 Benign Tumors 2e-05 Basal Cell Carcinoma 7e-06 Systemic Disease 6e-06 Pigmentation Disorders 2e-06 Basal Cell Carcinoma Basal Cell Carcinoma 0.999675 Viral Infections 0.000269 Bacterial Infections 3.7e-05 Bullous Disease 7e-06 Hair Diseases 4e-06 Viral Infections Psoriasis Systemic Disease 0.994801 Fungal Infections 0.00349 Bullous Disease 0.001062 Viral Infections 0.000172 Nail Diseases 0.000146 Psoriasis 0.99983 Fungal Infections 0.000102 Basal Cell Carcinoma 2.5e-05 Bullous Disease 2.1e-05 Viral Infections 1.4e-05 Bullous Disease Fungal Infections Exanthems & Eruptions 0.260488 Infestations & Bites 0.118785 Benign Tumors 0.114126 Vascular Tumors 0.10317 Vasculitis 0.072352 Systemic Disease 0.994801 Fungal Infections 0.00349 Bullous Disease 0.001062 Viral Infections 0.000172 Nail Diseases 0.000146 Benign Tumors Cellulitis 0.977077 Psoriasis 0.020311 Bullous Disease 0.00186 STDs 0.000383 Contact Dermatitis 0.000119 Systemic Disease Systemic Disease 0.99983 Fungal Infections 0.000194 Basal Cell Carcinoma 0.0 Bacterial Infections 0.0 Urticaria 0.0 Figure 2: The prediction results output by the fine-tuned GoogleNet network. The label at the top of each image is the ground truth. The Top-5 predictions and the corresponding probabilities are given at the bottom of each images. (a) GoogleNet (b) VGG16 (c) VGG19 Figure 3: The confusion matrices of the VGG16, VGG19 and GoogleNet based CNNs. All the three CNNs are trained and tested using the Dermnet images only. image, which is labeled 18 according to Table 2, as the input. We find that most of the retrieved neighbors are labeled as 18 and have a very similar appearance with the input image. Similarly, Figure 5 also retrieved a set of neighbors that look very similar with the input, especially the second image in the first row. However, all the retrieved neighbors in Figure 5 have a different label with the input image which indicates a misclassification. It means the CNN only picks the images with a coarse similarity and doesn’t look into the details. One of the possible reason is that the CNN doesn’t have enough images in the training set to learn the details. For example, in the Figure 5 case, if the CNN contains more images with a shot from the side for the given disease, it will get to learn the detailed information on the face instead of the face itself. To verify our assumption, we design another experiment. As the misclassification is probably due to the lack of training images on certain diseases, we refine both the training set and the test set by removing some skin diseases with inadequate photos. Thus, if the CNN is trained correctly, it should learn enough details from the refined training set. For any given test image with a skin disease that is included in the refined training set, the CNN should have a higher confidence to classify it correctly. The red bars in Figure 1 denote the classes we removed from the training set (Dermnet) and the test set (OLE). We remove the skin diseases that appear in both training set and test set and are low in image numbers. We then retrain the CNN using the refined training set. The Top-1 and Top-5 accuracies of the newly trained CNN on the refined test set are given in Table 5 (a) VGG19 (b) VGG19 Improved Figure 6: The confusion matrices of the CNNs using the VGG19 model only. All the CNNs are tested on the OLE images. For VGG19 Improved, the refined dataset is used. Figure 4: Image retrieval of correctly classified image. Top left is the test image. The rest are its neighbors. 14: Psoriasis. 18: Fungal Infections. curacy (VGG19) and 91.0% Top-5 accuracy (GoogleNet) when testing on the Dermnet dataset. We further discover the performance of the CNN architecture when testing on a different dataset (OLE). We find the classification system can only achieve 24.8% Top-1 accuracy and 61.7% Top-5 accuracy due to the lack of a broader variance in the training set. We show that by increasing the variance of the training set the Top-1 and Top-5 accuracies can be improved to 31.1% and 69.5%. In our future research, we hope we can push this work much further and get a better accuracy. There are several places we can improve our work. First, since ImageNet data are not specialized for skin data, the ImageNet pretrained models may not be the best choice for skin disease classification. Thus, we should train a CNN model from scratch and test its performance. Second, the Dermnet images are organized using a biological taxonomy which is not the best choice for computer vision applications. We will work with a dermatologist to design a visually organized taxonomy and apply it to our classifier 2 . Third, as the experimental results suggest that more variance in the training set would lead to a better accuracy, we should increase the size of our training set. Also note that the images retrieved by the networks are closely related to the ground truth. We may need to design a hierarchical classification algorithm using the retrieved images to improve the accuracy. Figure 5: Image retrieval of misclassified image. Top left is the test image. The rest are its neighbors. 14: Psoriasis. 18: Fungal Infections. 4. The Top-1 accuracy is 31.1% and the Top-5 accuracy is 69.5%. They are 6.3% and 7.8% higher than the previous experiment respectively. The confusion matrix is shown in Figure 6 (b). In a pairwise comparison between (a) and (b), we can find that the confusion matrix of the new CNN has higher values in its diagonal elements and lower values in its off-diagonal elements than its old counterpart. References [1] J. Arevalo, A. Cruz-Roa, V. Arias, E. Romero, and F. A. González. An unsupervised feature learning framework for basal cell carcinoma image analysis. Artificial intelligence in medicine, 2015. [2] J. Arroyo and B. Zapirain. Automated detection of melanoma in dermoscopic images. In J. Scharcanski and M. E. Celebi, editors, Computer Vision Techniques for the Diagnosis of Skin Cancer, Series in BioEngineering, pages 139–192. Springer Berlin Heidelberg, 2014. 4. Conclusion and Future work We have investigated the feasibility of building an universal skin disease classification system using deep CNN. We tackle this problem by fine-tuning ImageNet pretrained models (VGG16, VGG19, GoogleNet) with the Dermnet dataset. Our experiments show that the current state-ofart CNN models can achieve as high as 73.1% Top-1 ac- 2 This 6 idea is inspired by [10]. [17] H. Pehamberger, A. Steiner, and K. Wolff. In vivo epiluminescence microscopy of pigmented skin lesions. i. pattern analysis of pigmented skin lesions. Journal of the American Academy of Dermatology, 17(4):571–583, 1987. [18] A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. CNN features off-the-shelf: an astounding baseline for recognition. CoRR, abs/1403.6382, 2014. [19] J. K. Robinson. Sun exposure, sun protection, and vitamin d. Jama, 294(12):1541–1543, 2005. [20] H. W. Rogers, M. A. Weinstock, S. R. Feldman, and B. M. Coldiron. Incidence estimate of nonmelanoma skin cancer (keratinocyte carcinomas) in the us population, 2012. JAMA dermatology, 2015. [21] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. [22] A. Sáez, B. Acha, and C. Serrano. Pattern analysis in dermoscopic images. In J. Scharcanski and M. E. Celebi, editors, Computer Vision Techniques for the Diagnosis of Skin Cancer, Series in BioEngineering, pages 23–48. Springer Berlin Heidelberg, 2014. [23] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. CoRR, abs/1312.6229, 2013. [24] R. L. Siegel, K. D. Miller, and A. Jemal. Cancer statistics, 2015. CA: a cancer journal for clinicians, 65(1):5–29, 2015. [25] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014. [26] A. Steiner, H. Pehamberger, and K. Wolff. Improvement of the diagnostic accuracy in pigmented skin lesions by epiluminescent light microscopy. Anticancer research, 7(3 Pt B):433–434, 1986. [27] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. CoRR, abs/1409.4842, 2014. [28] H. Wang, A. Cruz-Roa, A. Basavanhally, H. Gilmore, N. Shih, M. Feldman, J. Tomaszewski, F. Gonzalez, and A. Madabhushi. Cascaded ensemble of convolutional neural networks and handcrafted features for mitosis detection. In SPIE Medical Imaging, pages 90410B–90410B. International Society for Optics and Photonics, 2014. [29] H. Wang, A. Cruz-Roa, A. Basavanhally, H. Gilmore, N. Shih, M. Feldman, J. Tomaszewski, F. Gonzalez, and A. Madabhushi. Mitosis detection in breast cancer pathology images by combining handcrafted and convolutional neural network features. Journal of Medical Imaging, 1(3):034003– 034003, 2014. [30] J. D. Whited and J. M. Grichnik. Does this patient have a mole or a melanoma? Jama, 279(9):696–701, 1998. [31] F. Xie, Y. Wu, Z. Jiang, and R. Meng. Dermoscopy image processing for chinese. In J. Scharcanski and M. E. [3] C. Barata, J. Marques, and T. Mendonça. Bag-of-features classification model for the diagnose of melanoma in dermoscopy images using color and texture descriptors. In M. Kamel and A. Campilho, editors, Image Analysis and Recognition, volume 7950 of Lecture Notes in Computer Science, pages 547–555. Springer Berlin Heidelberg, 2013. [4] Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell., 35(8):1798–1828, Aug. 2013. [5] H. Chang, Y. Zhou, A. Borowsky, K. Barner, P. Spellman, and B. Parvin. Stacked predictive sparse decomposition for classification of histology sections. International Journal of Computer Vision, 113(1):3–18, 2014. [6] N. Cox and I. Coulson. Diagnosis of skin disease. Rook’s Textbook of Dermatology, 7th edn. Oxford: Blackwell Science, 5, 2004. [7] A. Cruz-Roa, A. Basavanhally, F. González, H. Gilmore, M. Feldman, S. Ganesan, N. Shih, J. Tomaszewski, and A. Madabhushi. Automatic detection of invasive ductal carcinoma in whole slide images with convolutional neural networks. In SPIE Medical Imaging, pages 904103–904103. International Society for Optics and Photonics, 2014. [8] A. A. Cruz-Roa, J. E. A. Ovalle, A. Madabhushi, and F. A. G. Osorio. A deep learning architecture for image representation, visual interpretability and automated basal-cell carcinoma cancer detection. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2013, pages 403– 410. Springer, 2013. [9] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248–255, June 2009. [10] A. Esteva, B. Kuprel, and S. Thrun. Deep networks for early stage skin disease and skin cancer classification. [11] G. Fabbrocini, V. Vita, S. Cacciapuoti, G. Leo, C. Liguori, A. Paolillo, A. Pietrosanto, and P. Sommella. Automatic diagnosis of melanoma based on the 7-point checklist. In J. Scharcanski and M. E. Celebi, editors, Computer Vision Techniques for the Diagnosis of Skin Cancer, Series in BioEngineering, pages 71–107. Springer Berlin Heidelberg, 2014. [12] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR, abs/1502.03167, 2015. [13] G. K. Jana, A. Gupta, A. Das, R. Tripathy, and P. Sahoo. Herbal treatment to skin diseases: A global approach. Drug Invention Today, 2(8):381–384, August 2010. [14] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014. [15] C. M. Lawrence and N. H. Cox. Physical signs in dermatology: color atlas and text. Wolfe Publishing (SC), 1993. [16] A. Masood and A. Ali Al-Jumaily. Computer aided diagnostic support system for skin cancer: A review of techniques and algorithms. International Journal of Biomedical Imaging, 2013, 2013. 7 Celebi, editors, Computer Vision Techniques for the Diagnosis of Skin Cancer, Series in BioEngineering, pages 109– 137. Springer Berlin Heidelberg, 2014. [32] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks? CoRR, abs/1411.1792, 2014. 8