Supporting Materials for A Multi-resolution Textural Approach to Diagnostic Neuropathology Reporting Mohammad Faizal Ahmad Fauzia*, Hamza Numan Gokozanb*, Brad Elderc, Vinay K. Puduvallid, Christopher R. Piersonb,e,f, José Javier Oterob, Metin N. Gurcang a Faculty of Engineering, Multimedia University, Cyberjaya, Selangor, Malaysia; bDepartment of Pathology, The Ohio State University, Columbus, Ohio, USA; cDepartment of Neurological Surgery, The Ohio State University, Columbus, Ohio, USA; dDivision of Neuro-oncology, The Ohio State University Wexner Medical Center, Columbus, Ohio, USA; eNationwide Children’s Hospital Department of Pathology and Laboratory Medicine, Columbus, Ohio, USA; fDivision of Anatomy, The Ohio State University, Columbus, OH, USA; gDepartment of Biomedical Informatics, The Ohio State University, Columbus, Ohio, USA *These authors contributed equally to the work. Corresponding Authors: Dr. Mohammad Faizal Ahmad Fauzi Faculty of Engineering Multimedia University Jalan Multimedia 63100 Cyberjaya Selangor, MALAYSIA. Tel: +603-8312 5330 Fax: +603-8318 3029 Email: faizal1@mmu.edu.my Dr. José Javier Otero Assistant Professor The Ohio State University College of Medicine Department of Pathology Division of Neuropathology 4169 Graves Hall 333 W 10th Avenue Columbus, OH 43210 Tel: 614-685-6799 Fax: 614-292-5849 Email: jose.otero@osumc.edu Dr. Metin Gurcan 250 Lincoln Tower 1800 Cannon Drive Columbus, OH 43210 Office: 320-K Lincoln Tower Tel: (614) 688-9857 Fax: (614) 688-6600 Email: metin.gurcan@osumc.edu lab: http://bmi.osu.edu/~cialab Supplemental Materials and Methods Discrete Wavelet Frames Two-dimensional wavelet transform performs a spatial/spatial-frequency analysis on an image by repeatedly decomposing the image in the lower frequency bands, followed by sub-sampling [35]. A full wavelet decomposition of an image results in an array of wavelet coefficients, the same shape and size as the original image. One level wavelet decomposition of an image results in four separate channels, namely LL (low frequency band), LH, HL (medium frequency bands) and HH (high frequency band) channels. The decomposition can be performed repeatedly on the LL channel until some specific criteria is met. Because of the decomposition structure, the wavelet transform is also known as pyramidal wavelet transform (PWT). The number of channels is given by 3l+1, where l is the number of decomposition level. The discrete wavelet frames [36-39] is nearly identical to the standard wavelet transform, except that it upsamples the filters, rather than downsampling the image. While the frame representation is overcomplete and computationally more intensive than PWT, it holds the advantage of being translationally invariant. This is particularly important in our work since the glial processes (and the nuclei) can appear anywhere in the image. In our previous work [40-42], we found that DWF is the best texture feature method compared to several others such as PWT, tree-structured wavelet transform (TWT), discrete cosine transform(DCT), Gabor filter, Laws’ filter, and multiresolution simultaneous auto regressive model. Given an image, the DWF decomposes its channel using the same method as the wavelet transform, but without the subsampling process. This results in four filtered images with the same size as the input image. The decomposition is then continued in the LL channels only as in the wavelet transform, but since the image is not sub-sampled, the filter has to be upsampled by inserting zeros in between its coefficients. To compute the features, the mean energy of each channel is used and is given as: π−1 π−1 1 π(π) = ∑ ∑|ππ (π, π)| π×π (1) π=0 π=0 in which M and N are the number of rows and columns of the image, and Wk is the k-th channel or filtered images. The wavelet frames allow the algorithm to analyze the content of the image at a different frequency range, thus making texture classification possible. k-Nearest Neighbor Classification The k-nearest neighbor algorithm (k-nn) is a non-parametric method for classifying images or objects based on closest training examples in the feature space. The advantage of k-nn is that it can provide a degree of confidence on the classification. As an example, using k=5, for an image to be classified into a particular class, at least three of the five nearest neighbors must be of the particular class. This constitutes to 60%, 80% or 100% of the neighbors, which can be used as a reference to gauge the confidence level of the classification. The normalized Euclidean metric is used so as to prevent a particular dimension to dominate the distance measure, and is given as: 2 π π (π) − π π (π) π = ∑( ) π(π) (2) π in which i and j denote two image patterns, f(k) is the k-th component features, and σ(k) is the standard deviation of the distribution of features f(k) in the entire database and is used to normalize the individual feature components. Sensitivity, Precision, and Accuracy: Sensitivity and precision are defined by: ππ ππ + πΉπ (3) ππ ππ + πΉπ (4) ππππ ππ‘ππ£ππ‘π¦ = ππππππ πππ = The classification accuracy is defined by: π΄πππ’ππππ¦ = πΆππππππ‘ππ¦ ππππ π πππππ πππππ πππ‘ππ ππ’ππππ ππ πππππ (5) Supplemental Classification of Glioblastoma and Metastatic Neoplasm Region of Interest Segmentation A key diagnostic feature capable of determining if cells represent neoplastic cells of a metastasis versus a high grade glioma is the presence of glial processes between cell bodies that appear as a meshwork of eosinophilic fibers in H&E stained cytologic preps. From an image analysis perspective, these glial processes appear as anisotropic thin linear structures interspersed between cells as shown in the first 2 rows of Figure 1(a). Metastasis tissue, on the other hand appears more homogeneous in the background region, as can be seen in the last 2 rows of Figure 1(a). The homogeneous background is due to lack of glial-type filaments; therefore, our approach to classification was first to segment our region of interest, which is the nuclear free region of the tissue. For GB tissue, these refer to the anisotropic thin line segments, while for the metastasis tissue, the homogeneous segments. Since nuclei in H&E stained images correspond to dark blue pixels, the segmentation can be done by the visually meaningful decomposition-based segmentation procedure [23-24]. To ensure proper feature extraction, morphological erosion is carried out on the segmented regions of interest. In this workflow, we use a disk structuring element with radius 5. We conclude that the segmentation stage is critical since the inclusion of the nuclei regions during feature extraction contaminates the features, ultimately resulting in a less accurate classification. Feature Extraction and Classification Four levels of decomposition are carried out using Daubechies four-tap wavelet, which results in 13 channels of filtered images. Because there is no preferred orientation of the glial processes, the LH and HL channels at each level can be combined, reducing the number of channels and dimension of the feature vectors to nine. The four-tap wavelet basis combined with the disk-structuring element of radius five ensures that none of the coefficients from the non-ROI region are considered in the feature vector computation. Given the segmented region of interest from the previous stage, discrete wavelet frames (DWF) is applied to the image, but the mean energy of each channel is calculated only on the regions of interest. For our nine-dimensional feature vectors derived from the DWFs, the distances or dissimilarities between the test and training images are computed based on the normalized Euclidean distance metric and k smallest distances are considered to classify the test image. We conclude that the discrete wavelet frames texture features, fine-tuned as above, is able to accurately distinguish the glial-type filaments present in GB from the more homogeneous background in metastasis tissues. Supplemental p53 Immunohistochemistry Reporting Automated Nuclear Detection Supplemental Figure 3(a) shows the flowchart of our cell detection process. Given a digitized p53 stained image, we first convert the image into gray scale by only considering the luminance channel. Otsu thresholding is then applied to convert the luminance image into binary, separating the cells (blobs) from the background. The binary image is then cleaned by filtering very small and isolated blobs (blobs smaller than a certain minimum threshold, minarea). While Otsu thresholding is a very simple and straightforward technique in detecting dark objects from a light background, due to the lack of spatial information, it is prone to grouping closely connected cells together as a single cell, which will affect the cell counting process significantly. To address this, we developed a novel adaptive thresholding scheme for any detected regions with total area greater than a particular maximum threshold maxarea. The adaptive thresholding checks if a large detected blob consists of multiple cells by searching for potential valley within the blob, and proceeds to segment the blobs further if they do. For a 40x magnification image used in our experiment, a suitable minarea and maxarea threshold was found to be 300 and 1000 pixels (~70 and 230 µm2), respectively. Classification of Positive and Negative Cells Supplemental Figure 3(b) shows the flowchart of our positive-negative cell classification process. The classification of the cells as positive or negative was based on the intensity and color of the cells. Hence, the image is first converted into HSV (Hue-Saturation-Value) color model. For each of the detected cells found in the previous section, their centroid is determined and 32x32 pixel blocks are extracted around each centroid. The weighted Hue and Value are calculated for each block and are used for classifying the cells. The weights used are inversely proportional to the pixels’ distance to the centroid, with those closer to the center of the block receiving higher weight, and those further from the center receiving less weight. Negative p53-stained cells tend to be blue (higher Hue) and less intense (higher Value), while positive p53-stained cells tend to be brown (lower Hue) with varying intensities. Based on these properties, we developed a two-step classification rule: 1. If the weighted Value (wV) for a block is less than a particular threshold (darker), the block will be classified as containing positive cell, regardless of its weighted Hue (wH). 2. Otherwise, the classification depends on weighted Hue, with wH less than a particular threshold means the block contains positive cells, and wH greater than the threshold means it contains negative cells. From our experiments, suitable threshold value for both wH and wV for positive-negative cells classification is found to be 70. Strong-Moderate-Weak Cells Classification Supplemental Figure 3(c) shows the flowchart of our strong-moderate-weak cell classification process. Unlike the positive-negative classification (Section 4.3), which can be distinguished based on intensity as well as hue information, classifying the strong-moderate-weak cells are much more challenging due to the more subjective appearance of the cells’ intensities. Nevertheless, we note that strongly stained cells exhibit more homogeneous intensities within the cell’s boundary, while weakly stained cells exhibit more varying intensities, hence are more “textured”. We therefore propose to utilize texture features on top of the intensity information to distinguish the three classes of cells’ strength, especially between the moderate and weakly stained cells. As mentioned previously, another challenge in classifying positive cells is that the staining intensities differ between images. Because of this, cells with similar intensities may belong to different strength class in different tissue. To address this, an adaptive thresholding approach in which the threshold value to be used varies depending on the image content is used. We propose a threshold that is inversely proportional to the average intensity of all the detected cells. Cells with average intensity above the calculated threshold are automatically classified as strongly-stained cells. The remaining cells are then classified into the three classes by means of discrete wavelet frames to distinguish their “texturedness”. Three levels of decomposition are carried out for each 32x32 cell block, and the mean energy from each channel is used to create 10-dimensional features. The features are then compared to the training features by means of k-nearest neighbor classification. Unlike the leave-one-out strategy employed in the intra-operative consultation experiment in the previous section due to limited samples, we base our k-nearest neighbor classifier against 50 training samples each from the three classes. The 150 samples are extracted from images different from the test images. Several values of k was tested and k=23 is found to yield the highest accuracy. Supplemental Figures & Tables (a) (b) (c) (d) (e) (f) Supplemental Figure 1: Example of incorrectly classified tissues (a)-(c) Glioblastoma classified as metastasis, (d)-(f) Metastasis classified as Glioblastoma (a) (b) (c) (d) Supplemental Figure 2: Training samples for the (a) strong, (b) moderate, (c) weak, and (d) negatively stained cells p53 Images Detected Cells Otsu thresholding Determine the centroid for each detected cells Filter small blobs (<minarea) Obtain 32x32 pixel blocks around each centroid Blobs> maxarea? Yes Compute the weighted hue (wH) and value (wV) for each 32x32 block Separate cells through adaptive thresholding No wV<thresh No Yes Yes wH<thresh Post-processing Detected Cells No Positive Cells Negative Cells (b) (a) - (c) Supplemental Figure 3: (a) Nuclear detection process from digitized pathology slides, (b) Positive-negative nuclear classification process, (c) Strong-moderate-weak nuclear classification process Supplemental Table 1A: Demographic information from cytologic preparations (“smears”) Smears Age Sex Location Intraop diagnosis Final Diagnosis Notes S1 50 F Right frontal Infiltrating glioma (at Anaplastic S1*Smear Concerning least grade 2)* oligodendroglioma for higher grade S2* defer to permanent S2 63 M Left frontal Malignant neoplasm* GB S3 68 M Right frontal High grade glioma GB S4 60 M Left occipital Malignant neoplasm* GB S4* defer to permanent S5 61 M Left temporal High grade glioma* GB S5* GB concerned S6 37 M Left frontal Infiltrating glioma Anaplastic oligodendroglioma S7 S8 S9 S10 54 66 56 62 F F F M Right frontal Left frontal Right frontal Spinal cord Malignant neoplasm Metastatic S7* consistent with * melanoma melanoma Metastatic Metastatic colon S8* colon primary adenocarcinoma* adenocarcinoma Metastatic Metastatic breast carcimoma* carcinoma Metastatic carcinoma Metastatic renal cell carcinoma S11 48 M Right Metastatic epithelioid Metastatic occipital/ tumor pulmonary Parietal adenocarcinoma S9* breast primary S12 74 M Right GB GB Slightly Oligodedroglioma S13* few atypical cells, Hypercellular* Grade2 non diagnostic Infiltrating glioma Anaplastic temporal S13 S14 26 62 F F Right frontal Right frontal astrocytoma S15 50 F Right frontal Malignant neoplasm* Metastatic lung S15* consistent with carcinoma carcinoma S16 48 M Right frontal GB GB S17 82 M Left temporal Anaplastic GB astrocytoma(G3) S18 68 M Lateral/ High grade glioma GB medial temporal lobe S19 48 M Right frontal GB GB S20 64 M Left frontal High grade glioma GB S21 68 M Left parietal GB GB S22 25 M Right parietal/ Metastatic tumor Metastatic mixed occipital malignant germ cell tumor S23 52 F Right Metastatic carcinoma Metastatic lung S23* neuroendocrine cerebellar carcinoma* carcinoma consistent with pulmonary primary lesion S24 67 M Left clival Neoplastic tissue* Chondrosarcoma S24* Differential includes chondrosarcoma and chordoma S25 60 M Left frontal Malignant neoplasm* Lymphoma S25* smear consistent with lymphoma Supplemental Table 1B: Demographic information for the p53 analysis Age G1 46 Sex M Location Left Final p53 stain reported in diagnosis pathology report GB p53+(no further info) GB Scattered giant cells are + GB Immunoreactivity ranging from frontal G2 58 M Left frontal G3 77 F Left occipital G4 39 M Right weak to strong GB Strongly diffusely positive GB Immunoreactivity to p53 frontal G5 60 F Left temporaloccipital Notes G6 61 M Left GB temporal G7 53 M Left Diffuse immunoreactivity to p53 GB Weak to moderate- 30% GB expression noted in 50% of tm frontal G8 70 F Right parietal G9 74 M Right cells GB temporal G10 50 F Left cells GB frontal G11 48 M Right 55 M Right GB 82 M Left majority of tm cells- strong reactivity GB frontal G13 expression noted in 60% of tm cells frontal G12 expression noted in 80% of tm majority of tm cells- moderate to strong reactivity GB strong reactivity- over 70% GB less than 10% GB moderate staining in occasional temporal G14 64 M Left frontal G15 63 F Right frontal G16 69 F Left temporal tumor nuclei GB diffusely positive G17 54 M Left GB diffusely positive Brain Anaplastic diffusely positive mass* astrocytoma frontal G18 31 M Location info not available-outside consultation slide G19 82 M Right GB diffusely positive GB diffusely positive, strong frontal G20 73 M Right frontal G21 54 M Right temporal reactivity GB majority of tumor cells stainedweak to moderate intensity