Supporting Materials for A Multi-resolution Textural Approach to

advertisement
Supporting Materials for
A Multi-resolution Textural Approach to Diagnostic
Neuropathology Reporting
Mohammad Faizal Ahmad Fauzia*, Hamza Numan Gokozanb*, Brad Elderc, Vinay K. Puduvallid,
Christopher R. Piersonb,e,f, José Javier Oterob, Metin N. Gurcang
a
Faculty of Engineering, Multimedia University, Cyberjaya, Selangor, Malaysia; bDepartment of
Pathology, The Ohio State University, Columbus, Ohio, USA; cDepartment of Neurological Surgery, The
Ohio State University, Columbus, Ohio, USA; dDivision of Neuro-oncology, The Ohio State University
Wexner Medical Center, Columbus, Ohio, USA; eNationwide Children’s Hospital Department of
Pathology and Laboratory Medicine, Columbus, Ohio, USA; fDivision of Anatomy, The Ohio State
University, Columbus, OH, USA; gDepartment of Biomedical Informatics, The Ohio State University,
Columbus, Ohio, USA
*These authors contributed equally to the work.
Corresponding Authors:
Dr. Mohammad Faizal Ahmad Fauzi
Faculty of Engineering
Multimedia University
Jalan Multimedia
63100 Cyberjaya
Selangor, MALAYSIA.
Tel: +603-8312 5330
Fax: +603-8318 3029
Email: faizal1@mmu.edu.my
Dr. José Javier Otero
Assistant Professor
The Ohio State University College of Medicine
Department of Pathology
Division of Neuropathology
4169 Graves Hall
333 W 10th Avenue
Columbus, OH 43210
Tel: 614-685-6799
Fax: 614-292-5849
Email: jose.otero@osumc.edu
Dr. Metin Gurcan
250 Lincoln Tower
1800 Cannon Drive
Columbus, OH 43210
Office: 320-K Lincoln Tower
Tel: (614) 688-9857
Fax: (614) 688-6600
Email: metin.gurcan@osumc.edu
lab: http://bmi.osu.edu/~cialab
Supplemental Materials and Methods
Discrete Wavelet Frames
Two-dimensional wavelet transform performs a spatial/spatial-frequency analysis on an image by
repeatedly decomposing the image in the lower frequency bands, followed by sub-sampling [35]. A full
wavelet decomposition of an image results in an array of wavelet coefficients, the same shape and size as
the original image. One level wavelet decomposition of an image results in four separate channels,
namely LL (low frequency band), LH, HL (medium frequency bands) and HH (high frequency band)
channels. The decomposition can be performed repeatedly on the LL channel until some specific criteria
is met. Because of the decomposition structure, the wavelet transform is also known as pyramidal wavelet
transform (PWT). The number of channels is given by 3l+1, where l is the number of decomposition
level.
The discrete wavelet frames [36-39] is nearly identical to the standard wavelet transform, except
that it upsamples the filters, rather than downsampling the image. While the frame representation is overcomplete and computationally more intensive than PWT, it holds the advantage of being translationally
invariant. This is particularly important in our work since the glial processes (and the nuclei) can appear
anywhere in the image. In our previous work [40-42], we found that DWF is the best texture feature
method compared to several others such as PWT, tree-structured wavelet transform (TWT), discrete
cosine transform(DCT), Gabor filter, Laws’ filter, and multiresolution simultaneous auto regressive
model. Given an image, the DWF decomposes its channel using the same method as the wavelet
transform, but without the subsampling process. This results in four filtered images with the same size as
the input image. The decomposition is then continued in the LL channels only as in the wavelet
transform, but since the image is not sub-sampled, the filter has to be upsampled by inserting zeros in
between its coefficients.
To compute the features, the mean energy of each channel is used and is given as:
π‘š−1 𝑛−1
1
𝑓(π‘˜) =
∑ ∑|π‘Šπ‘˜ (𝑖, 𝑗)|
𝑀×𝑁
(1)
𝑖=0 𝑗=0
in which M and N are the number of rows and columns of the image, and Wk is the k-th channel or filtered
images. The wavelet frames allow the algorithm to analyze the content of the image at a different
frequency range, thus making texture classification possible.
k-Nearest Neighbor Classification
The k-nearest neighbor algorithm (k-nn) is a non-parametric method for classifying images or objects
based on closest training examples in the feature space. The advantage of k-nn is that it can provide a
degree of confidence on the classification. As an example, using k=5, for an image to be classified into a
particular class, at least three of the five nearest neighbors must be of the particular class. This constitutes
to 60%, 80% or 100% of the neighbors, which can be used as a reference to gauge the confidence level of
the classification. The normalized Euclidean metric is used so as to prevent a particular dimension to
dominate the distance measure, and is given as:
2
𝑓 𝑖 (π‘˜) − 𝑓 𝑗 (π‘˜)
𝑑 = ∑(
)
𝜎(π‘˜)
(2)
π‘˜
in which i and j denote two image patterns, f(k) is the k-th component features, and σ(k) is the standard
deviation of the distribution of features f(k) in the entire database and is used to normalize the individual
feature components.
Sensitivity, Precision, and Accuracy:
Sensitivity and precision are defined by:
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
(3)
𝑇𝑃
𝑇𝑃 + 𝐹𝑃
(4)
𝑆𝑒𝑛𝑠𝑖𝑑𝑖𝑣𝑖𝑑𝑦 =
π‘ƒπ‘Ÿπ‘’π‘π‘–π‘ π‘–π‘œπ‘› =
The classification accuracy is defined by:
π΄π‘π‘π‘’π‘Ÿπ‘Žπ‘π‘¦ =
πΆπ‘œπ‘Ÿπ‘Ÿπ‘’π‘π‘‘π‘™π‘¦ π‘π‘™π‘Žπ‘ π‘ π‘–π‘“π‘–π‘’π‘‘ 𝑐𝑒𝑙𝑙𝑠
π‘‡π‘œπ‘‘π‘Žπ‘™ π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ 𝑐𝑒𝑙𝑙𝑠
(5)
Supplemental Classification of Glioblastoma and Metastatic Neoplasm
Region of Interest Segmentation
A key diagnostic feature capable of determining if cells represent neoplastic cells of a metastasis versus a
high grade glioma is the presence of glial processes between cell bodies that appear as a meshwork of
eosinophilic fibers in H&E stained cytologic preps. From an image analysis perspective, these glial
processes appear as anisotropic thin linear structures interspersed between cells as shown in the first 2
rows of Figure 1(a). Metastasis tissue, on the other hand appears more homogeneous in the background
region, as can be seen in the last 2 rows of Figure 1(a). The homogeneous background is due to lack of
glial-type filaments; therefore, our approach to classification was first to segment our region of interest,
which is the nuclear free region of the tissue. For GB tissue, these refer to the anisotropic thin line
segments, while for the metastasis tissue, the homogeneous segments. Since nuclei in H&E stained
images correspond to dark blue pixels, the segmentation can be done by the visually meaningful
decomposition-based segmentation procedure [23-24]. To ensure proper feature extraction, morphological
erosion is carried out on the segmented regions of interest. In this workflow, we use a disk structuring
element with radius 5. We conclude that the segmentation stage is critical since the inclusion of the nuclei
regions during feature extraction contaminates the features, ultimately resulting in a less accurate
classification.
Feature Extraction and Classification
Four levels of decomposition are carried out using Daubechies four-tap wavelet, which results in 13
channels of filtered images. Because there is no preferred orientation of the glial processes, the LH and
HL channels at each level can be combined, reducing the number of channels and dimension of the
feature vectors to nine. The four-tap wavelet basis combined with the disk-structuring element of radius
five ensures that none of the coefficients from the non-ROI region are considered in the feature vector
computation. Given the segmented region of interest from the previous stage, discrete wavelet frames
(DWF) is applied to the image, but the mean energy of each channel is calculated only on the regions of
interest. For our nine-dimensional feature vectors derived from the DWFs, the distances or dissimilarities
between the test and training images are computed based on the normalized Euclidean distance metric and
k smallest distances are considered to classify the test image. We conclude that the discrete wavelet
frames texture features, fine-tuned as above, is able to accurately distinguish the glial-type filaments
present in GB from the more homogeneous background in metastasis tissues.
Supplemental p53 Immunohistochemistry Reporting
Automated Nuclear Detection
Supplemental Figure 3(a) shows the flowchart of our cell detection process. Given a digitized p53 stained
image, we first convert the image into gray scale by only considering the luminance channel. Otsu
thresholding is then applied to convert the luminance image into binary, separating the cells (blobs) from
the background. The binary image is then cleaned by filtering very small and isolated blobs (blobs smaller
than a certain minimum threshold, minarea). While Otsu thresholding is a very simple and
straightforward technique in detecting dark objects from a light background, due to the lack of spatial
information, it is prone to grouping closely connected cells together as a single cell, which will affect the
cell counting process significantly. To address this, we developed a novel adaptive thresholding scheme
for any detected regions with total area greater than a particular maximum threshold maxarea. The
adaptive thresholding checks if a large detected blob consists of multiple cells by searching for potential
valley within the blob, and proceeds to segment the blobs further if they do. For a 40x magnification
image used in our experiment, a suitable minarea and maxarea threshold was found to be 300 and 1000
pixels (~70 and 230 µm2), respectively.
Classification of Positive and Negative Cells
Supplemental Figure 3(b) shows the flowchart of our positive-negative cell classification process. The
classification of the cells as positive or negative was based on the intensity and color of the cells. Hence,
the image is first converted into HSV (Hue-Saturation-Value) color model. For each of the detected cells
found in the previous section, their centroid is determined and 32x32 pixel blocks are extracted around
each centroid. The weighted Hue and Value are calculated for each block and are used for classifying the
cells. The weights used are inversely proportional to the pixels’ distance to the centroid, with those closer
to the center of the block receiving higher weight, and those further from the center receiving less weight.
Negative p53-stained cells tend to be blue (higher Hue) and less intense (higher Value), while
positive p53-stained cells tend to be brown (lower Hue) with varying intensities. Based on these
properties, we developed a two-step classification rule:
1. If the weighted Value (wV) for a block is less than a particular threshold (darker), the block will
be classified as containing positive cell, regardless of its weighted Hue (wH).
2. Otherwise, the classification depends on weighted Hue, with wH less than a particular threshold
means the block contains positive cells, and wH greater than the threshold means it contains
negative cells.
From our experiments, suitable threshold value for both wH and wV for positive-negative cells
classification is found to be 70.
Strong-Moderate-Weak Cells Classification
Supplemental Figure 3(c) shows the flowchart of our strong-moderate-weak cell classification process.
Unlike the positive-negative classification (Section 4.3), which can be distinguished based on intensity as
well as hue information, classifying the strong-moderate-weak cells are much more challenging due to the
more subjective appearance of the cells’ intensities. Nevertheless, we note that strongly stained cells
exhibit more homogeneous intensities within the cell’s boundary, while weakly stained cells exhibit more
varying intensities, hence are more “textured”. We therefore propose to utilize texture features on top of
the intensity information to distinguish the three classes of cells’ strength, especially between the
moderate and weakly stained cells. As mentioned previously, another challenge in classifying positive
cells is that the staining intensities differ between images. Because of this, cells with similar intensities
may belong to different strength class in different tissue. To address this, an adaptive thresholding
approach in which the threshold value to be used varies depending on the image content is used. We
propose a threshold that is inversely proportional to the average intensity of all the detected cells.
Cells with average intensity above the calculated threshold are automatically classified as
strongly-stained cells. The remaining cells are then classified into the three classes by means of discrete
wavelet frames to distinguish their “texturedness”. Three levels of decomposition are carried out for each
32x32 cell block, and the mean energy from each channel is used to create 10-dimensional features. The
features are then compared to the training features by means of k-nearest neighbor classification. Unlike
the leave-one-out strategy employed in the intra-operative consultation experiment in the previous section
due to limited samples, we base our k-nearest neighbor classifier against 50 training samples each from
the three classes. The 150 samples are extracted from images different from the test images. Several
values of k was tested and k=23 is found to yield the highest accuracy.
Supplemental Figures & Tables
(a)
(b)
(c)
(d)
(e)
(f)
Supplemental Figure 1: Example of incorrectly classified tissues (a)-(c) Glioblastoma classified as metastasis,
(d)-(f) Metastasis classified as Glioblastoma
(a)
(b)
(c)
(d)
Supplemental Figure 2: Training samples for the (a) strong, (b) moderate, (c) weak, and (d) negatively stained cells
p53 Images
Detected
Cells
Otsu thresholding
Determine the centroid for
each detected cells
Filter small blobs
(<minarea)
Obtain 32x32 pixel blocks
around each centroid
Blobs>
maxarea?
Yes
Compute the weighted hue (wH)
and value (wV) for each 32x32
block
Separate cells through
adaptive thresholding
No
wV<thresh
No
Yes
Yes
wH<thresh
Post-processing
Detected
Cells
No
Positive
Cells
Negative
Cells
(b)
(a)
-
(c)
Supplemental Figure 3: (a) Nuclear detection process from digitized pathology slides, (b) Positive-negative nuclear classification
process, (c) Strong-moderate-weak nuclear classification process
Supplemental Table 1A: Demographic information from cytologic preparations (“smears”)
Smears
Age
Sex
Location
Intraop diagnosis
Final Diagnosis
Notes
S1
50
F
Right frontal
Infiltrating glioma (at
Anaplastic
S1*Smear Concerning
least grade 2)*
oligodendroglioma
for higher grade
S2* defer to permanent
S2
63
M
Left frontal
Malignant neoplasm*
GB
S3
68
M
Right frontal
High grade glioma
GB
S4
60
M
Left occipital
Malignant neoplasm*
GB
S4* defer to permanent
S5
61
M
Left temporal
High grade glioma*
GB
S5* GB concerned
S6
37
M
Left frontal
Infiltrating glioma
Anaplastic
oligodendroglioma
S7
S8
S9
S10
54
66
56
62
F
F
F
M
Right frontal
Left frontal
Right frontal
Spinal cord
Malignant neoplasm
Metastatic
S7* consistent with
*
melanoma
melanoma
Metastatic
Metastatic colon
S8* colon primary
adenocarcinoma*
adenocarcinoma
Metastatic
Metastatic breast
carcimoma*
carcinoma
Metastatic carcinoma
Metastatic renal cell
carcinoma
S11
48
M
Right
Metastatic epithelioid
Metastatic
occipital/
tumor
pulmonary
Parietal
adenocarcinoma
S9* breast primary
S12
74
M
Right
GB
GB
Slightly
Oligodedroglioma
S13* few atypical cells,
Hypercellular*
Grade2
non diagnostic
Infiltrating glioma
Anaplastic
temporal
S13
S14
26
62
F
F
Right frontal
Right frontal
astrocytoma
S15
50
F
Right frontal
Malignant neoplasm*
Metastatic lung
S15* consistent with
carcinoma
carcinoma
S16
48
M
Right frontal
GB
GB
S17
82
M
Left temporal
Anaplastic
GB
astrocytoma(G3)
S18
68
M
Lateral/
High grade glioma
GB
medial
temporal lobe
S19
48
M
Right frontal
GB
GB
S20
64
M
Left frontal
High grade glioma
GB
S21
68
M
Left parietal
GB
GB
S22
25
M
Right parietal/
Metastatic tumor
Metastatic mixed
occipital
malignant germ cell
tumor
S23
52
F
Right
Metastatic carcinoma
Metastatic lung
S23* neuroendocrine
cerebellar
carcinoma*
carcinoma consistent
with pulmonary primary
lesion
S24
67
M
Left clival
Neoplastic tissue*
Chondrosarcoma
S24* Differential
includes chondrosarcoma
and chordoma
S25
60
M
Left frontal
Malignant neoplasm*
Lymphoma
S25* smear consistent
with lymphoma
Supplemental Table 1B: Demographic information for the p53 analysis
Age
G1
46
Sex
M
Location
Left
Final
p53 stain reported in
diagnosis
pathology report
GB
p53+(no further info)
GB
Scattered giant cells are +
GB
Immunoreactivity ranging from
frontal
G2
58
M
Left
frontal
G3
77
F
Left
occipital
G4
39
M
Right
weak to strong
GB
Strongly diffusely positive
GB
Immunoreactivity to p53
frontal
G5
60
F
Left
temporaloccipital
Notes
G6
61
M
Left
GB
temporal
G7
53
M
Left
Diffuse immunoreactivity to
p53
GB
Weak to moderate- 30%
GB
expression noted in 50% of tm
frontal
G8
70
F
Right
parietal
G9
74
M
Right
cells
GB
temporal
G10
50
F
Left
cells
GB
frontal
G11
48
M
Right
55
M
Right
GB
82
M
Left
majority of tm cells- strong
reactivity
GB
frontal
G13
expression noted in 60% of tm
cells
frontal
G12
expression noted in 80% of tm
majority of tm cells- moderate
to strong reactivity
GB
strong reactivity- over 70%
GB
less than 10%
GB
moderate staining in occasional
temporal
G14
64
M
Left
frontal
G15
63
F
Right
frontal
G16
69
F
Left
temporal
tumor nuclei
GB
diffusely positive
G17
54
M
Left
GB
diffusely positive
Brain
Anaplastic
diffusely positive
mass*
astrocytoma
frontal
G18
31
M
Location info not
available-outside
consultation slide
G19
82
M
Right
GB
diffusely positive
GB
diffusely positive, strong
frontal
G20
73
M
Right
frontal
G21
54
M
Right
temporal
reactivity
GB
majority of tumor cells stainedweak to moderate intensity
Download