IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 23, NO. 6, JUNE 2004 661 Surface Normal Overlap: A Computer-Aided Detection Algorithm With Application to Colonic Polyps and Lung Nodules in Helical CT David S. Paik*, Christopher F. Beaulieu, Geoffrey D. Rubin, Burak Acar, R. Brooke Jeffrey, Jr., Judy Yee, Joyoni Dey, and Sandy Napel Abstract—We developed a novel computer-aided detection (CAD) algorithm called the surface normal overlap method that we applied to colonic polyp detection and lung nodule detection in helical computed tomography (CT) images. We demonstrate some of the theoretical aspects of this algorithm using a statistical shape model. The algorithm was then optimized on simulated CT data and evaluated using a per-lesion cross-validation on 8 CT colonography datasets and on 8 chest CT datasets. It is able to achieve 100% sensitivity for colonic polyps 10 mm and larger at 7.0 false positives (FPs)/dataset and 90% sensitivity for solid lung nodules 6 mm and larger at 5.6 FP/dataset. Index Terms—Colonic polyp, computed tomography colonography (CTC), computer-aided detection (CAD), cross-validation, lung nodule, statistical shape model. I. INTRODUCTION I N the United States, lung cancer and colon cancer are the first and second leading cancer killers, respectively. Early detection of colonic polyps and lung nodules, the precursors to these diseases, has been shown to improve survival [1]–[4]. Clinically significant colonic polyps and lung nodules are resolvable given the spatial resolution of helical computed tomography (CT). However, the accuracy and efficiency of viewing hundreds of source axial images per exam are limited by human factors, such as attention span and eye fatigue. In response to this challenge, a variety of computer-aided diagnosis (CAD) methods have been developed to improve both the accuracy and the efficiency of detecting lesions in this and other difficult three–dimensional (3-D) diagnostic problems. Among them, many different approaches to CAD for CT lung nodule detection and for CT colonic polyp detection have been developed, several of which are described next. Manuscript received September 22, 2003; revised February 5, 2004. This work was supported in part by the National Institutes of Health (NIH) under Grant R01-CA72023 and Grant P41-RR09784. The Associate Editor responsible for coordinating the review of this paper and recommending its publication was G. Wang. Asterisk indicates corresponding author. *D. S. Paik is with the Department of Radiology, Stanford University, Stanford, CA, USA 94305-5450 USA (e-mail: paik@smi.stanford.edu). C. F. Beaulieu, G. D. Rubin, R. B. Jeffrey, Jr., J. Dey, and S. Napel are with the Department of Radiology, Stanford University, Stanford, CA, USA 94305-5450 USA. B. Acar is with the Department of Electrical and Electronic Engineering, Bogazici University, 34342 Bebek, Istanbul, Turkey. J. Yee is with the Department of Radiology, University of California at San Francisco, San Francisco, CA 94143 USA. Digital Object Identifier 10.1109/TMI.2004.826362 For detecting lung nodules, Giger et al. [5] developed a two–dimensional (2-D) multilevel thresholding detection algorithm that creates a tree structure of image components. Rules were applied to shape features in order to identify nodules. 94% per-nodule sensitivity was achieved with 1.25 false positives (FPs) per patient. Armato et al. [6], [7] applied multilevel thresholding and a rolling ball algorithm toward detecting lung nodules. Shape and attenuation features were classified using linear discriminant analysis and the algorithm achieved 70% per-nodule sensitivity with 1.5 FPs per axial section. Brown et al. [8] have presented an algorithm for both detection and surveillance of lung nodules in CT. Region-growing and morphological operators were used to create candidate locations. Attenuation, location, volume, and shape features were matched to model objects in a semantic net with fuzzy membership that serves as a generic a priori anatomic model. In the initial detection task, 86% per-nodule sensitivity was achieved with 11 FPs per patient. Lee et al. [9] used both genetic algorithm-based and semicircular template matching to identify initial candidates and attenuation, shape, and gradient feature rules to reduce FPs. They achieved 72% per-nodule sensitivity with 31 FPs per patient. Erberich et al. [10] applied the Hough transform (HT) for both 2-D circles and 3-D spheres using a rule-based classifier and achieved 30%–40% per-nodule sensitivity with a “large amount of false positive nodules.” Several approaches to colonic polyp CAD in CT colonography have also been proposed. Vining et al. [11] developed a method that measures abnormal wall thicknesses using heuristics. They report 73% per-polyp sensitivity with a range of 9–90 FPs per patient. Other approaches have analyzed the morphology of the mucosal surface. Summers et al. [12], [13] have developed a method that uses size, attenuation, and curvatures calculated with convolution-based partial derivatives to find polyps. They achieved 64% per-lesion sensitivity with 3.5 FPs per patient. Yoshida et al. [14]–[16] use shape index and curvedness (computed with partial derivatives), directional gradient concentration, and quadratic discriminant analysis. Using both prone and supine datasets, they achieve 100% per-patient sensitivity with 2.0 FPs per patient (per-polyp sensitivity not stated). Kiss et al. [17] combined surface normal and sphere fitting methods to achieve 100% per-polyp sensitivity with 8.2 FPs per patient. In addition, secondary CAD algorithms that are designed to reduce the FP rate of primary CAD algorithms have been proposed. Göktürk et al. [18] applied support vector machines to shape and attenuation features to reduce FPs and 0278-0062/04$20.00 © 2004 IEEE 662 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 23, NO. 6, JUNE 2004 reported a 50% increase in specificity at a constant sensitivity level. Acar et al. [19] have applied edge displacement fields to reduce FPs and reported a 23% increase in specificity at a constant sensitivity level. Both of these FP reduction methods were evaluated using initial versions [20] of the work presented in this paper. These previously described CAD algorithms for both lung nodules and colonic polyps have achieved varying levels of accuracy although they all leave room for improvement. Additionally, many of them represent analogous approaches, using similar feature vectors and similar classifiers. The purpose of this work was to create a new and effective approach to CAD by developing new features and by optimizing them toward two clinical applications. We present in this paper 1) a novel multi-purpose CAD algorithm that we call the surface normal overlap (SNO) method, 2) a theoretical analysis of this algorithm using a statistical model of anatomic shape, 3) an optimization and analysis method for this algorithm using simulated CT data and a per-lesion crossvalidation, and 4) preliminary evaluations of the detection performance of the CAD algorithm in lung nodule and colonic polyp detection, using the free-response receiver operating characteristic (FROC) paradigm. We propose to use this algorithm as the first step in a larger overall detection scheme and, thus, we strive for high sensitivity at a reasonable FP rate, thus allowing secondary FP reduction algorithms, such as some of those described above, and/or radiologist visualization to improve specificity. II. CAD ALGORITHM The following sections describe the processing steps of the SNO method, of which the end result is a list of the coordinates of the center of each suspicious region, sorted in decreasing prospect of being a lesion. A. Pre-Processing and Segmentation Because both colonic polyps and lung nodules are generally not much denser than water, high density structures (e.g., bone) are removed by clamping voxel intensities to be no greater than that of water. Next, the CT volume data are made isotropic by tri-linear interpolation to 0.6 mm 0.6 mm 0.6 mm voxels . This is done in order to reduce any bias to produce between lesions at different orientations and also to reduce any bias between datasets with different voxel sizes. Next, segmentation is performed automatically to identify either the colon lumen or the lung parenchyma. A binary image, , is created by thresholding all air intensity voxels including air outside the body. This is followed by a negative masking of all air intensity voxels morphologically connected to any of the edges of the data volume (air outside the body), thus leaving only voxels with air density within the body. In the case of CT colonography, the inferior portions of the lungs are usually captured and are removed using a negative mask of a 3-D region-filling seeded with air intensity regions with a width or depth of greater than 60 mm in the most superior axial slice. Finally, small air in the colon datasets, in the lung pockets ( datasets) are also negatively masked from . and is used to Next, a binary image, , is derived from limit the remaining computations to voxels near the air-tissue interfaces in the colon or lung. This 1) reduces computational requirements and 2) eliminates FPs arising within soft tissue structures outside the region of interest. begins as the surface voxels of and is then morphologically dilated by 5 mm to produce a thickened region that contains the air-tissue interfaces of interest (see Fig. 1, rows 1–2). B. Gradient Orientation The gradient orientation step computes the image gradient vector, , at high-contrast edges in order to determine the 3-D orientation of the image surface normals. We have modified the Canny edge detector [21] to limit calculations to only those voxels contained in . Each partial derivative is computed using two one-dimensional (1-D) Gaussian convolution kernels and one 1-D derivative-of-Gaussian convolution kernel, , , and (standard deviawhich are parameterized by , , and discrete samples respectively. tions) with Our implementation additionally takes advantage of the , by only computing these greatly reduced search space, separable 1-D convolutions where strictly necessary. The minimum locus of voxels necessary to correctly calculate the 1-D convolutions is calculated by morphologically dilating . to denote the floor function (greatest integer less Using than or equal to ), the convolutions are performed as follows. The separable convolutions in the direction are calculated for dilated voxels in the direction and each voxel in voxels in the -direction. The separable then dilated convolutions in the -direction are then calculated for each voxel in dilated voxels in the -direction. The separable convolutions in the -direction are then calculated for each voxel in . Nonmaximum suppression and hysteresis thresholding (thresholds of 100 HU and 200 HU) follow the separable convolutions The resulting surface normal vectors, . which point inward into the tissue, are denoted as C. Surface Normal Overlap The surface normal overlap step is critical for detecting lesions. Each voxel in accumulates a score proportional to the number of surface normals that pass through or near it. Generally speaking, both colonic polyps and lung nodules tend to have some convex regions on their surfaces and thus, the inward , near these features pointing surface normal vectors, tend to intersect or nearly intersect within the tissue. Pulmonary vessels in the lungs and haustral folds in the colon also have convex surfaces, but since they have a dominant curvature along a single direction (as opposed to high curvature in two directions as is common on the surfaces of polyps and nodules), the score for vessels and folds is generally less than that for nodules , counts the number and polyps. A 3-D array, denoted of surface normals that pass through or near to each voxel in (see Fig. 1, row 3). Each voxel in corresponds to . In order to limit the contributions from a voxel in normal vectors from very distant structures, the length of the , the scale projected surface normal vectors was defined as PAIK et al.: SURFACE NORMAL OVERLAP 663 Fig. 1. Intermediate steps of the CAD algorithm. Left column: CT colon data with a polyp. Right column: CT lung data with a nodule. Top row: cross sectional slice through an example lesion. Middle row: limited search space (S ) from segmentation shown with semitransparent overlay. Bottom row: cross sectional slice through summed overlapping Gaussian profile cylinders shown in grayscale with white denoting highest CAD score. of the largest spatial features of interest. Prior to any evaluation, was set to 10 mm. Providing robustness to variations from perfectly spherical objects is critical to the success of this algorithm in real patient data. Our algorithm provides robustness both in the radial direction (objects with nonconstant distance from surface points to center) and in the transverse direction (objects with nonuniform magnitude of curvature). Robustness in the radial direction is provided by the fact that normal vectors can intersect at dif- ), thus allowing many ferent distances from the surface (up to nonspherical but roughly globular objects to have a significant response. Robustness in the transverse direction is provided by allowing skewed surface normal vectors (those that do not intersect but . This is accomnearly intersect) to be additive in plished by projecting cylinders of a finite width in the direction of the surface normal rather than by projecting line segments. Because surface normals that come closer to intersecting are 664 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 23, NO. 6, JUNE 2004 assumed to be more likely generated by the same convex surface patch, the projected cylinders are given a transverse profile that gradually decreases in intensity at greater radial distances, thereby providing robustness in the transverse direction and again, allowing many nonspherical but roughly globular objects to have a significant response. The profile was chosen to . be Gaussian with a scale of For computational efficiency, the entire surface normal overlap step is implemented by first scan converting (i.e., discretizing into voxels) a line segment for each surface normal and summing it into . Then, the in Gaussian profile of the cylinders is achieved by a sequence of . The linearly separable 1-D convolutions to produce convolution is given by Fig. 2. Cross section through the stochastic shape model. Dotted circle is the nominal sphere/cylinder of radius R. Solid contour is the shape after deviation from the nominal model. The deviated surface patch shown as a small oval and . the deviated surface normal direction as direction PQ (1) include The discrete kernels are chosen so that , , and sampled at 0.6 0.6 0.6 mm to cover 95% of the Gaussian curve. The computational burden imposed by the convolution calculation is minimized using morphological operators similar to the gradient orientation calculation, as in Section II-B. D. Candidate Lesion Selection The local maxima of are selected as candidate lesion locations. However, complex anatomic structures with multiple convex surface patches may generate multiple local was defined to be the smallest scale of the features maxima. that might generate distinct local maxima and was set to 10 mm prior to any evaluation. Local maxima are considered in descending order, and if a local maximum occurs within of an already accepted local maximum, the lesser value is assumed to be part of the same structure and is rejected. After this spatial filtering, the remaining local maxima are sorted in decreasing order and recorded as the potential lesion locations. is given by The score for a potential lesion at location , and we refer to this as a “CAD hit. III. THEORETICAL ANALYSIS In this section, we present several theoretical analyzes of SNO. To facilitate them, we have created a statistical anatomic shape model that balances the complex variability of human anatomy with sufficient simplicity to allow for analytic insight. We then use this theoretical model to compare the behavior of the SNO method to the 3-D Hough transform for spheres in distinguishing lung nodules from vessels and colonic polyps from haustral folds. A. Stochastic Anatomic Shape Model This shape model begins with a simple parametric shape and then adds stochastically-governed variation in order to produce realistic anatomic shape. The nominal model for lung nodules and colonic polyps are spheres and hemispheres, respectively, while the nominal model for vessels and haustral folds are cylinders and half-cylinders, respectively. In order to account for anatomic variability, infinitesimal surface patches on the surface are then allowed to simultaneously vary from their nominal position at radius in an implicit manner that preserves continuity between patches. In this analysis, radial position deviation is represented with random variable , and surface normal direction variation is represented with random variable (see Fig. 2). We model each surface patch as deviating from its nominal position in the radial direction with a Gaussian distribution on , with a mean of 1 and a standard deviation of . We then model each surface normal vector as deviating from its nominal direction with two independent and identically distributed and , with zero mean and standard Gaussian variables, deviation of . These two displacements are in the plane perpendicular to the radial direction, a unit distance away from the surface normal. For convenience, we represent directional devi, which has a Rayleigh ation by its magnitude distribution, and by its angle, , which has a uniform distribution on the interval [0, ). The probability density functions and , respectively, and for and are parameterized by are given by and Thus, the variability of a shape of radius is represented by the random variables , , and . As each surface patch varies from its nominal position and direction, the solid angle subtended by the patch stays constant but the area of the patch changes due to 1) the magnification factor at different radial distances and 2) a cosine inverse proportionality as it is tilted away from its nominal direction. We let be the area of the nominal surface patch and be the area of each surface patch after variation. For spheres, the surface patches . are indexed by , and the relationship is For cylinders, the surface patches are indexed by around the axis of the cylinder and by down the length of the axis, and the relationship is . PAIK et al.: SURFACE NORMAL OVERLAP 665 Fig. 3. Examples of theoretical model parameter estimation. The surface normals that belong to the structure are shown as lines on the solid surface and the nominal sphere/cylinder model is shown as partially translucent with perspective. (a) polyp, (b) haustral fold, (d) lung nodule, (e) pulmonary vessel. Histograms of and normalized to a unit Gaussian and unit Rayleigh are shown for each shape class and compared to the parametric models in (c) and (f). m u B. Model Parameter Estimation In order to make quantitative comparisons using this theoretical model, the parameters controlling the degree of variation from the nominal shapes, and , were estimated directly from the patient datasets. This process involved: 1) performing edge detection on the datasets; 2) identifying the surface normal vectors that belong to the nodule, polyp, vessel or fold; 3) finding the nominal sphere or cylinder that fit those surface normal vectors; 4) computing the value of and for each and from those sample surface normal; and 5) estimating populations. All polyps 5 mm and larger and all nodules 3 mm and larger were used for parameter estimation. From each of the 8 colon datasets and each of the 8 lung datasets, eight folds or vessels were selected prospectively and manually and then, selected for parameter estimation. Thus, our analyzes included 18 polyps and 64 selected folds in the colon, and 84 nodules and 64 selected vessels in the lung. Section V-C contains full details about these datasets. Edge detection was performed as described in Section II-B. Isolation of the surface normals belonging to the structure of interest was performed as follows. The center of the structure of interest, , was chosen manually, and all surface normal vectors whose bases were further than two radii away were eliminated. Next, if a line segment between and a surface normal intersected an air intensity voxel ( ), the surface normal was eliminated. Then, surface normals pointing more than 90 away from were eliminated. Finally, the largest contiguous region of surface normals was kept as belonging to the structure of interest. This algorithm was quite successful in isolating the structures of interest; Fig. 3 provides some examples. The nominal shape was derived using a least squares fit of a sphere or cylinder to the bases of the surface normals (i.e., directional information was not used), which leads to an estimate of . Using this nominal shape, both and were computed diwas estimated using rectly for each surface normal. Finally, was estimated the maximum likelihood estimate. However, differently because a few surface normals were at nearly 90 away from either the center of the sphere or the axis of the cylinder, leading to nearly infinite values of and, thus giving inaccurate estimates due to MLE sensitivity to outliers. This happened particularly around the “skirt” of polyps and haustral folds where it is difficult to make binary decisions about what belongs to the polyp or fold and what does not. Instead, we estiby using the method-of-moments but substituting the mated more robust median estimator for the mean. By setting the em, equal to the point at which the CDF is pirical median, 0.5, we get which leads to The results of the parameter estimation are shown in Fig. 4. C. Algorithm Models In order to understand the theoretical performance of the SNO algorithm and to compare it to the Hough transform for spheres, we applied this anatomic shape model for polyps, folds, nodules, and vessels with differing degrees of variation from the nominal model. Specifically, we compared the expectation of the CAD 666 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 23, NO. 6, JUNE 2004 (a) (a) (b) (b) (c) (c) Fig. 4. Boxplots showing the minimum, maximum, interquartile range, and median for the theoretical shape model parameters: (a) R, (b) , and (c) s . scores over the random variables , , and . See Appendix for the CAD score formulas and their derivations. D. Theoretical Comparison of SNO and HT In order to compare the theoretical performance of the SNO and HT algorithms, we varied the shapes from perfect spheres and cylinders to more realistic anatomic shapes. The range of realistic shape variability was estimated as described in Fig. 5. Results from the theoretical model. In (a) and (b), and s are simultaneously varied from 0 to twice their median values. (a) Colon CAD scores as a function of and s . (b) Lung CAD scores as a function of and s . (c) CAD scores as a function of lesion radius, R, at the median values of and s . Section III-B. Fig. 5(a)–(b) presents resulting CAD scores as a function of deviation from ideal shape for polyps and folds, and for nodules and blood vessels. These plots demonstrate the robustness of SNO to deviation from ideal shapes whereas HT fails to discriminate between shapes with realistic amounts of shape variability. Fig. 5(c) presents the scores of both polyps and PAIK et al.: SURFACE NORMAL OVERLAP Fig. 6. lung. 667 (a) (b) (c) (d) (a) Boxplot showing SNO scores using the theorical model in the colon, (b) HT scores in the colon, (c) SNO scores in the lung, and (d) HT scores in the nodules as a function of lesion size, revealing that larger lesions lead to a smoothly increasing response with SNO. However, HT produces a response that varies tremendously for nearly identical lesion sizes. and were then used to produce The estimated values of a CAD score for each shape. These scores are shown in Fig. 6. Wilcoxon rank sum tests were performed to test the difference between TP and FP scores. For SNO, there were significant dif) and between ferences between polyp and fold ( nodule and vessel ( ). However, for HT, there were ) not significant differences between polyp and fold ( ). nor between nodule and vessel ( IV. GRADIENT ORIENTATION OPTIMIZATION Using simulated CT phantoms, we optimized the gradient orientation kernel scale parameters, , , and , in order to yield the most accurate gradient orientations. This step was critical because errors in estimating the gradient direction can di. The selection of minish surface normal overlap in is particularly important because too values for , , and small a value will lead to errors from noise and very localized perturbations in the surface whereas too large a value will lead to insensitivity to smaller lesions. The optimization of the pais described later in Section V. rameter A. Phantom Model and Error Metric A series of hemispherical phantom objects were “scanned” using software that simulates CT scanning including forward projections, partial volume effects, correlated CT noise, helical interpolation, and filtered backprojection reconstruction [22]. The simulations were performed with a 3-mm slice thickness, ), 0.7 0.7 mm pitch of 2 ( pixels in plane, and 1-mm reconstruction interval. For each phantom, a water-equivalent density sphere was embedded halfway into a water-equivalent density, randomly oriented, flat wall to simulate a prototypical colonic polyp or prototypical lung nodule on the chest wall. The diameters of the spheres, , ranged from 5 to 15 mm at 1-mm increments, chosen to demonstrate the effects of changes in size from the prototypical 10-mm lesion. For each sphere size, there were 10 phantoms, each with a different wall orientation, randomized subvoxel offset, and randomized CT detector noise, leading to a total of 110 phantoms. The error metric used to evaluate the accuracy of gradient orientations, , was defined to be the mean perpendicular distance , to the true center of from the surface normal vector, the sphere. In order to include only detected gradient orientations from the hemisphere and not those from the flat wall, only of the sphere center gradients located within 1.05 (see Fig. 7). entered into the calculation of B. Gradient Orientation Kernel Scale and executed the CAD algorithm with We let simultaneously varying all three from 0.05 to 4.0 mm in 0.05-mm increments. The errors, from the 10 phantoms at a given and , were then averaged. The results of this optimization are plotted in Fig. 8, led to the least error across all showing that lesion sizes. 668 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 23, NO. 6, JUNE 2004 Fig. 7. Left: simulated CT phantom of a prototypical lesion consisting of a sphere embedded halfway into a flat wall. Right: an oblique CT cross-section through a phantom lesion showing how e is calculated as the mean distance (black line segments) from normal vectors (arrows) to the true sphere center (white dot). Fig. 9. The accuracy of the gradient orientation step on tri-linearly interpolated data is relatively independent of kernel scale anisotropy as varies from by a factor of 0.5–2.0. d ranges from 5 to 15 mm and was held constant at 1 mm while was varied. Fig. 8. The accuracy of the gradient orientations is dependent on the kernel , and the size of the lesion (d ranging from 5 to 15 mm). scale, = 1 mm. The The error metric is minimized across all lesion sizes at rippling effect is an artifact due to discontinuous jumps in the number of samples in the convolution kernel, n , n , and n , at different values of . C. Gradient Orientation Kernel Anisotropy We also investigated the effects of anisotropic resolution in helical CT on gradient orientations. In general, helical CT has lower effective resolution through-plane than in-plane, regardless of reconstruction interval. Thus, one might expect setting would compensate for this effect. To test this, we (based on the results of fixed the first optimization) and let vary from 0.05–4.0 mm in 0.05-mm increments. The errors, , from the 10 phantoms at and , were then averaged, as in Section III-B. a given Fig. 9 plots the results of this optimization, showing that the accuracy of the gradient orientations on tri-linearly interpolated anisotropic data is almost independent of the anisotropy of the kernel scale between ratios of 0.5 and 2.0. As a result, all subsequent experiments were carried out with . V. CAD PERFORMANCE EVALUATION This section describes two experiments that were performed in order to evaluate the performance of the CAD algorithm in detecting real colonic polyps and in detecting solid lung nodules. In Section III, the goal of the gradient orientation optimization was to produce as accurate gradient orientations as possible error metric. However, the goal of optimizing the using the , is to maximally differentiate between cylinder scale, lesions and FPs. A wider cylinder scale will increase the robustness to deviations from perfectly spherical shapes but will also decrease the differentiability between lesions and FP structures. Since this effect is dependent on the variability of true lesion and FP shapes, the CT simulations were insufficient. Instead, a cross-validation was performed to evaluate lesion detection per. formance with prospectively chosen values of A. Cross-Validation When cross-validation is used to evaluate a classifier, the dataset is split into sets (sometimes referred to as “folds”). sets and then evalThe classifier is trained de novo on uated on the remaining independent set(s); this is repeated sets and sets and the for all possible divisions into results are averaged in a reasonable manner [23]. This type of evaluation gives an unbiased estimate of performance and has a lower standard error than traditional holdout methods [24]. . However, in a detection In this evaluation, we chose problem such as this, splitting the dataset (CAD hits) into sets at the granularity of lesions is problematic because the result based on the training sets) of training (e.g., selecting changes the dataset (e.g., CAD hit locations) and thus, changes the sets themselves. In order to retain the independence between training and test sets, the sets were selected on the basis of distinct anatomic features rather than on the basis of CAD hit locations. The locus of all CAD hit locations was computed for each possible value . Two CAD hits were considered to be the same of anatomic feature if they were within 10 mm of each other. The sets (7 in the colon dataset, 46 in the lung dataset) were made to be disjoint by having one true positive (TP) lesion and equal numbers of randomly selected FP anatomic features. CAD hits PAIK et al.: SURFACE NORMAL OVERLAP were distributed among the sets according to the anatomic feature to which they belonged. Thus, to within the 10-mm constraint, no two sets contained CAD hits on the same anatomic feature. For an error metric in training, we selected the value of that maximized , which we define as the normalized partial area under the FROC curve from 0–20 FPs/dataset indicates perfect and from 90%–100% sensitivity. indicates that less detection performance whereas than 90% of lesions are detected by the time 20 FPs/dataset is is that it is an area reached. The geometric interpretation of that represents how close the curve comes to perfect performance (i.e., 100% sensitivity at 0 FPs/dataset). No probabilistic are drawn in this paper, but it should conclusions based on has not been be noted that the probabilistic interpretation of fully explored and its interpretation in a probabilistic context is unclear. B. Automated CAD Scoring Because the results of a CAD evaluation can be especially numerous, we implemented a method for automatically scoring each CAD hit as either a TP or FP, thus eliminating subjectivity and clerical errors. Because there is some spatial variance in what is declared to be the center of a lesion in the gold standard (Section V-C2), the scoring algorithm must allow for some small amount of spatial mis-registration between gold standard lesion locations and TP CAD hits. Thus, we defined any CAD hit as a TP if it was within half the lesion’s measured diameter from the lesion’s measured center. Lesion diameters were measured manually from the CT images during the setting of the gold standard using a multi-planar digital caliper tool. To determine the overall performance, all of the CAD hits within the test sets were determined to be a TP or a FP and then pooled and sorted in descending order of score. In the event that multiple CAD hits are scored as TP for a given lesion, only the highest scoring hit was considered a TP; lower scoring hits were ignored. TP CAD hits on lesions below the size range of interest were not considered FPs nor did they increase the sensitivity. At a given score threshold, sensitivity was calculated as the percentage of lesions within the size range of interest that had been identified by a TP CAD hit above that score threshold. The FP rate was calculated as the total number of FP CAD hits divided by the number of datasets. C. Detection Evaluation 1) Data Collection: Colon: From a database of 116 CT colonography exams performed at either Stanford University or at the San Francisco VA hospital, 8 exams were selected for this study in order to include a reasonably large number of colonic polyps and to balance the number of patients with and without large polyps. Exams with excessive image artifact or retained water were excluded. Case selection was done without regard to polyp conspicuity or shape. These 8 patients were given rectal air contrast and scanned in the supine position with single- or multidetector helical CT (GE HiSpeed/CTi or LightSpeed, General Electric Medical Systems, Milwaukee, WI) with effective section width of 2.5–3.75 mm and 50% overlapping reconstruction. 669 Fig. 10. Per-lesion cross validation training. The range of values of that was selected is shown in the shaded areas. For reference, A across all the datasets (i.e., no cross-validation) is shown by the solid lines. Note that the (shaded area) near method is able to prospectively choose values of the true optimum (maxima of solid lines) with low variance. Immediately following CT scanning, each patient also underwent fiber-optic colonoscopy (FOC). These results were correlated to the CT images with a total of 7 “clinically significant” polyps ( ) found in 4 of 8 patients and a total of 11 small polyps (5-9 mm) found in 3 of 8 patients. A wide range polyp shapes were present in the datasets 2) Gold Standard: Colon: A study coordinator with extensive experience in CTC and blinded to CAD results carefully reviewed the CTC data and recorded the location and diameter of polyps found by FOC into the gold standard database. Only one significant polyp (measured as 15 mm by FOC) was unable to be located in the CT images, most likely due to retained water. A total of 10 small polyps (1 was 8 mm and 9 were 5–6 mm measured by FOC) were unable to be located in the CT images. Lung: The gold standard in the chest was established by consensus of two radiologists interpreting the axial CT data. Both the location and diameters of nodules were recorded into the gold standard database. The CAD algorithm was then executed on the 8 colon and 8 lung datasets using the optimized gradient orientation from Section III and cross-validated as described above. 3) Results: Colon: In the colon datasets, the value of based on the cross-validation training sets was in the range 2.2–2.6 mm with a mean of 2.5 mm. Fig. 10 shows the range of these values of compared to the performance on all of the datasets ( computed on all datasets over all values of , not just on training sets). Note that the latter is shown in this figure for reference only and was never used in training or evaluation. The mean performance across the test sets in detecting “clinically significant” colonic polyps was as follows. in diameter were detected at 4.6 FPs/dataset. 90% were detected at 6.0 FPs/dataset. 95% were detected at 6.5 FPs/dataset. 100% were detected at 7.0 FPs/dataset. Fig. 11 shows a FROC plot of these results. A manual analysis of the 50 highest scoring FPs in each colon dataset (400 total) revealed that 86% were due to haustral folds, 670 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 23, NO. 6, JUNE 2004 Fig. 11. Per-lesion cross validation evaluation. FROC results for both colonic polyp and ling nodule detection. 5% were due to the colon wall between adjacent loops, 4% were due to a failure in segmentation in one dataset that captured the air trapped in the blanket beneath the patient. Finally, each of the following classes contributed 1% or less: stool, insufflation catheter, small bowel, and the ileocecal valve. based Lung: In the lung datasets, the value of on the cross-validation training sets was in the range 0.6–0.8 mm with a mean of 0.6 mm (see Fig. 10). The mean performance across the test sets in detecting “clinically significant” solid lung nodules was as follows. in diameter were detected at 1.3 FPs/dataset. 90% were detected at 5.6 FPs/dataset. 95% were detected at 63 FPs/dataset. 100% were detected at 165 FPs/dataset (see Fig. 11). A manual analysis of the 50 highest scoring FPs in each lung dataset (400 total) revealed that 69% were due to pulmonary vessels, 13% were due to bronchi, 6% were due to vessels or bronchi in the mediastinum, 6% were calcified nodules, 2% were due to bulges on the pleural surface, 2% were small indeterminate opacities, and each of the following classes contributed 1% or less: mass, metal artifact, and a single 2.9-mm noncalcified nodule. VI. DISCUSSION A. CAD Algorithm The surface normal overlap algorithm was originally inspired by the Hough transform for spheres [25] but differs in some which counts the important ways. First, the array number of overlapping or nearly overlapping surface normals is similar to the Hough transform accumulator array in that it sums “votes” for objects that could produce those normals, but it does not require all of the votes to correspond to a single parameterized sphere, as does the Hough transform. Second, the Hough transform, in its various forms, is highly specific for one type of shape (e.g., spheres). Even the variant known as the generalized Hough transform, which avoids parametric representations, requires a specific model. This specificity is desirable when the shape to be detected can be precisely defined in advance, but the specificity is problematic when it cannot. In contrast, the SNO algorithm does not use an explicit model of a single type of shape, but instead uses an implicit model to represent an entire gamut of shapes much larger than the set of spheres detected by the Hough transform for spheres. This property is extremely important when the objects to be detected can have significant variability in shape such as with lung nodules or colonic polyps. Rather than specifying the exact shape to be detected, the SNO algorithm defines a fuzzy constraint on surface normal orientation in order to define the varied set of shapes to be detected, both by allowing angular mismatches (transverse robustness) and by allowing edges at different radial distances to sum (radial robustness). For comparison, Erberich et al. have applied the 3-D Hough transform for spheres toward lung nodule detection in CT but reported only 30%–40% sensitivity at a high FP rate [10]. The voxel intensity clamping pre-processing step is used to eliminate edges due to bone but will also make calcified lung nodules have a similar response as noncalcified nodules. While the presence of calcification in lung nodules may help distinguish between benign and malignant nodules, the goal of this algorithm is detection, not classification. For classification purposes, the original voxel intensities can easily be restored following the detection of suspicious regions. While we have presented and evaluated the SNO CAD method as being preceded by a specific segmentation scheme, we emphasize that the only purposes of the segmentation step are 1) to reduce computation by targeting only anatomical regions of interest (ROIs) and 2) to eliminate hits from regions disjoint from the anatomical ROI. Unlike some other CAD approaches requiring extremely accurate segmentation (e.g., inclusion of juxtapleural nodules), the goal of our segmentation step is to provide a volumetric region that contains all possible image edges that could be due to the presence of lesions and not to fully delineate their edges. Thus, our relatively simplistic segmentation algorithm is sufficient and is not limiting factor in the overall detection performance. However, it is possible that gross errors in segmentation could adversely affect performance. In colon CAD, for example if the most superior axial slice contains a large enough portion of the transverse colon, it could be assumed to be lung and, therefore, erroneously eliminated. Thus far, we have not had any such gross failures in segmentation; however, we note that, ultimately, any robust segmentation method could be substituted for ours. One limitation of this study is that we did not formally optimize the pre-processing and segmentation as we did many of the other parameters of the algorithm. This stage was used to create a thick boundary region containing easily detected high contrast edges and it was very robust to a wide variety of parameter settings. Our experience was that these parameters had little effect or no on the overall performance. We also emphasize that, at this stage, this algorithm is not intended to be used independent of visual interpretation by a radiologist. At the present stage of development (and perhaps, well into the foreseeable future), this type of algorithm should be seen as an aid for improving radiologist performance. In this regard, although our algorithm generated more than one FP (on average) per data set in order to achieve high sensitivity, it does not indicate that the majority of patients will have FP detections once reviewed in conjunction with a radiologist. If many of the PAIK et al.: SURFACE NORMAL OVERLAP FP hits are recognized as such and are discarded by the radiologist, overall performance may be acceptable. However, this remains to be shown by future evaluations. The difference in performance between detecting colonic polyps and detecting lung nodules is of interest. Although neither FROC curve completely dominates the other, the “average” lung nodule (i.e., near 50% sensitivity) was easier to detect (i.e., at fewer FPs/dataset) than the “average” colonic polyp, despite the relatively larger size of polyps. The difference is partially accounted for by the difference in lesion morphology—the gross shape of the nodules (not at the fine level of spiculations) in these datasets tended to be more globular than that of polyps, which tended to have more complex surfaces due to the gradual rise and fall of the mucosal surface around a polyp. This may be attributable to the relatively isotropic growth pattern of lung nodules in lung parenchyma as compared to the anisotropic growth pattern of colonic polyps, which emerge and protrude from the colon wall. Another factor in the different detection performance is that nodules usually have detectable edges on their entire surface compared to polyps, which have detectable edges only on their outer half and thus, have half the number of overlapping surface normals. The difference in performance is also accounted for by the completely dissimilar sources of FPs (e.g., background anatomy), which are very different in both appearance and quantity between the lung and colon. The hardest to find nodules (i.e., near 100% sensitivity) were, however, harder to find than the hardest to find polyps. This was due to several exceptional nodules whose appearance was different than most other nodules. These four nodules accounted for the range of sensitivity from 91%–100% and were the only nodules that were detected at greater than 8 FPs/dataset. These included three small, elliptical nodules on the chest wall (6 3 mm, 7 4 mm, and 6 4 mm) and one very irregular nodule at the apex of the lung (20 16 mm); see Fig. 12. B. Theoretical Analysis While both SNO and HT perform similarly on perfect spheres and cylinders, the results shown in Fig. 5(a)–(b) demonstrate that HT rapidly loses its ability to distinguish between sphere and cylinder as the shape variability approaches realistic levels. On the other hand, SNO retains its shape discrimination under much greater levels of shape variability. Additionally, Fig. 5(c) demonstrates that slight differences in size can lead to very different HT scores due to the interaction of and . The theoretical CAD scores (see Fig. 6) suggest that SNO is better able to distinguish between the presented shapes than does HT. While HT is valued for its specificity to the parametric model (e.g., spheres), it is the ability to detect shapes that vary from the nominal shape model that makes SNO particularly suitable for discriminating anatomic shapes. In the theoretical shape model, note that statistical dependence between neighboring surface patches is not assumed since this would be very unrealistic. However, not assuming independence limits the analysis to the score at the center of the shape instead of the maximum score over the whole shape (expectation of maximum is not maximum of the expectations, see Appendix ). Although HT scores may be higher off center, this 671 Fig. 12. The four lung nodules that were hardest to detect accounting for the sensitivity from 91%–100% and accounting for all detected nodules at greater than 8 FPs/dataset. Nodule sizes are (a) 6 3 mm, (b) 7 4 mm, (c) 6 6 mm, and (d) 20 16 mm. 2 2 2 2 would require nearly spherical subportions of the surface in order to yield higher scores off center. The theoretical model does assume independence between and . The correlation coefficients between and for polyps, folds, nodules, and vessels were 0.22, 0.04, 0.06, , respectively. Although very low correlation does not rule out dependence, it helps to justify this first order approximation. In our formulation of the SNO method, we have chosen to project Gaussian-profiled cylinders for each surface normal. One limitation of this work is that this choice of projected shape may not be optimal with respect to the theoretical model. This is an area of future work on this algorithm that we plan to investigate further. C. CAD Algorithm Optimization The design of the simulated phantoms was an important factor in the optimization of the gradient orientations. The simulated hemisphere on a flat wall model was designed to find the optimal balance between the decreased noise from greater blurring and the increased sensitivity to small objects from lesser blurring. The use of a hemisphere on a flat wall is an obvious first order model for colonic polyps. We had originally tried optimizing gradient orientations for lung nodules on spherical phantoms. However, it was necessary to include other background anatomic structures (e.g., flat wall) in order to realistically model the effect of a large gradient orientation convolu). With a large kernel, nearby but distion kernel (large tinct anatomic structures would contribute to the convolution and cause error in the gradient orientation. This was balanced against the effect of a small kernel, which had a decreased noise ). reduction benefit due to less blurring (small 672 The hemisphere on a flat wall also serves as a model for lung nodules in contact with the chest wall, which are not the most common type of lung nodule but are anecdotally more difficult to detect than contact-free nodules by this CAD algorithm. We chose to optimize the gradient orientation step for this type of lung nodule because we wanted the algorithm to perform as well as possible on these difficult to detect lesions. The results of the gradient orientation kernel anisotropy optimization were initially unexpected by the authors. The Canny edge detector is designed so that the blurring is performed by the Gaussian and derivative of Gaussian kernels. Because most CT images are inherently blurred more in the longitudinal direction (i.e., -direction), we originally hypothesized that would compensate and produce more accurate gradient orientations. However, the experiment showed that this effect was nearly nonexistent on tri-linearly interpolated data. We have not tested the effect of using an anisotropic kernel on higher order interpolated data, which may yield different results. There were several other algorithm parameters that were not formally optimized. For instance, the hysteresis thresholds used in edge detection were not optimized. However, in both the colon and lung, image contrast is excellent and edge detection was observed to be very robust. Also note that these thresholds do not affect the direction of detected gradients, which are , only dependent on the convolution. Another example is the length of the projected cylinder. We have anecdotally observed both polyp-to-fold and nodule-to-vessel distances to be . Direct visualization of typically 15–20 mm, greater than has demonstrated that cylinder overlap from neighboring structures is generally not a large problem. D. CAD Algorithm Evaluation The results of this preliminary evaluation of lung nodule detection were based on a dataset with a large proportion of the nodules are due to one patient with metastatic disease ). Although we cannot distinguish the ( performance of our algorithm on primary bronchogenic carcinoma from the performance on metastases, we believe that the detection of both primary and metastatic nodules is important. For those patients with pulmonary metastases secondary to colorectal cancer, many gynecological cancers, head and neck cancers, renal cell cancer, malignant melanoma, and sarcomas, pulmonary resection is an important primary therapy with a 5-year survival rate of 21%-68% [26], [27]. Also, we note that we did not evaluate the efficacy of our algorithm for detecting ground glass opacities in the lung. Further studies are needed to evaluate the performance characteristics on various types of lesions. A limitation of the evaluation of colonic polyp detection is that it uses only supine data. Generally, both prone and supine images are used for CT colonography, but we evaluated the algorithm using only supine images because the problem of matching CAD results between prone and supine images is still unsolved. Additionally, treating prone and supine images of the same patient as independent would violate the assumption of independence that allows the cross-validation estimate of performance to be unbiased. Another point regarding the polyp evaluation is that not all FOC determined polyps were found IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 23, NO. 6, JUNE 2004 by the gold standard setter in the supine CT images ( ), probably due to retained water and/or other factors. Therefore, the CAD algorithm’s failure to identify these polyps is not a failure of the algorithm (since they were not visible in the images) but rather a failure of the CT colonography patient preparation and/or data collection. Thus, they were not counted against the algorithm in this evaluation. While cross-validation mitigates the problem of over-fitting to the data, it does not remove biases that may be present in the entire dataset. While effort was to avoid bias in the case selection, performance with this algorithm on other datasets may vary with factors such as patient population, image quality, scanner parameters, etc. Although a greater number of cases were available, we did not utilize the entire database for this evaluation because the cross-validation technique required executing the algorithm on each case over a large number of , which was computationally prohibitive. values of For binary classification problems, the area under the ROC curve, , has been widely used as a performance metric. Anal, has been deogously, the area under the AFROC curve, scribed as a performance metric for multiple detection probcalculated by a binormal FROC lems. We experimented with curve fitting procedure [28] but found the curve fitting to be unreliable on this data. The FP image model assumes that the operator generally makes less than one FP per image. Data points much beyond one FP per image become nearly indistinguishable due to the Poisson assumption and thus, the fitted curves are most unreliable in this region even though it may be of greatest interest for evaluating a CAD algorithm that will subsequently be reviewed by a human reader. We chose to use partial area under the FROC curve as a cost function for training because it was not prone to curve fitting errors under parametric assumptions. This is analogous to the ROC partial area index [29]. In particular, we found that using a partial area index was important so that the training optimized for the hardest to find polyps and nodules rather than the average polyps and nodules. E. Comparison to Other CAD Algorithms The SNO method differs from many of the previously proposed lung nodule CAD algorithms [5]–[8] in that rather than using a variety of basic shape descriptors such as perimeter, area, volume, sphericity, compactness, elongation, etc., we focus on a single shape measure that is tuned for the specific application(s). Other approaches use some type of idealized shape model [9], [10] but achieve poorer performance, perhaps due to the lack of flexibility in such an explicit model rather than an implicit model that describes an entire gamut of shapes. The approach of McNitt-Gray et al. [30] exemplifies another important aspect of computer-aided diagnosis, the classification between benign and malignant. While our work does not specifically address this problem, we envision our work as part of a larger overall CAD scheme that will at some point also include classification. The approach of Vining et al. [11] to polyp detection is notable because it attempts to detect polyps based on wall thickness rather than mucosal surface morphology. However, thus far, no other groups have reported success with this type of approach. The approaches of both Summers et al. [12], [13] and PAIK et al.: SURFACE NORMAL OVERLAP of Yoshida et al. [14]–[16] share in common the use of partial derivatives to compute principle curvatures. However, the differences in how they are combined and classified vary and may partially account for the differences in performance. Also, Yoshida et al.add gradient concentration (GC) and directional gradient concentration (DGC) in order to improve performance. These two measures also compute the confluence of gradient vectors toward a common point, although they are quite different than the SNO method in actual formulation. However, direct comparison of performance of GC and DGC to this work is difficult because many other features are combined and also, per-polyp sensitivity is not reported. Quantitative comparisons are precluded by differences in study designs and by the relatively small number of datasets used in both this work and other published works. While most CAD algorithms are described for a single clinical application, the CAD algorithm described in this paper was found to be promising at more than one task. It performed favorably compared to many of the aforementioned CAD schemes although differences in patient populations, CT technology, and analysis methods preclude strict quantitative comparisons. 673 HT has a value of , the accumulator bin size, that is applied to polyps and folds alike and another value of that is applied to nodules and vessels alike. A. SNO: Nodules and Polyps The SNO score can be computed in terms of the weight, , of the surface normals in a given surface patch, the area of of surface normals per each surface patch, and the density unit area. The expected SNO score of a polyp or nodule is given by The weight, , of each surface normal due to convolution, has contributions from the entire length in the direcin Fig. 2). Using (1) and the relationship tion (along , the expected SNO score for a given sphere radius is as shown in the equation at the bottom of the page. After factoring, the integral become unity and we get VII. CONCLUSION We have 1) developed a novel CAD algorithm, the surface normal overlap method, for both colonic polyp detection and solid lung nodule detection, 2) demonstrated the theoretical traits of this algorithm using a statistical shape model, 3) optimized its performance using a CT simulations and a per-lesion cross-validation method, and 4) provided a preliminary evaluation of its performance in both detection tasks,. The approach we have presented is generalized in that it is able distinguish between focal lesions such as polyps and solid nodules and background anatomy such as blood vessels and haustral folds. While the CAD algorithm demonstrated in this paper has shown promise for both lung nodule and colonic polyp detection, we ultimately envision it as the first stage of a larger CAD scheme where a set of suspicious locations is passed on to a second stage, possibly comprised of more computationally intensive classifier(s) that would aim to decrease the FP rate. Regardless of dependence, the expectation of a sum of random variables is the sum of the expectations of each random variable, and we obtain Reducing further, substituting for , and using the relationship , we get Because we model polyps as hemispheres, we use to get APPENDIX In the following sections, the formulas for the expected CAD score (SNO or HT) of the various types of anatomic objects (polyps, folds, nodules, and vessels) are derived using the theoretical model. Note that each of the four anatomic object classes , and , which control the size and has its own value of , degree of shape variability. The SNO method has a value of that is applied to polyps and folds alike and another value of that is applied to nodules and vessels alike. B. SNO: Vessels and Folds For the sake of this analysis, a local coordinate system is chosen with the CAD hit at the origin, the cylinder axis in the -direction. The index variable varies along and the index variable varies as a function of angle around the axis. 674 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 23, NO. 6, JUNE 2004 A surface patch that whose position varies along the -direction has a normal vector that is known to pass through Using (1) and , the distance from the line to the CAD hit at the origin, we get the equation at the bottom of the page. As before, the integral becomes unity after factoring and the expectation of a sum is switched for a sum of expectations This becomes The expectation of a sum is switched for a sum of expectations Reducing further and using the relationship we get We define such that and we get , Because we model polyps as hemispheres, we use to get D. HT: Vessels and Folds Similar to before, we start with We add subscripts to emphasize the dependence of , and , and we get on , Because we model a fold as a half-cylinder, we use to get C. HT: Nodules and Polyps if otherwise The surface normal passes through and and the point is the point along the surface normal which corresponds to the quantized value We get We model the Hough transform for spheres using a function that is 1 when a surface normal vector corresponds to a given accumulator bin of width , and 0 otherwise. This function examines , the distance in to the true center given , the quantized value of in the accumulator if otherwise The expectation of a sum is switched for a sum of expectations We define such that In the continuous case, is only a function of and we get and, thus PAIK et al.: SURFACE NORMAL OVERLAP 675 Because we model a fold as a half-cylinder, we use to get E. Noise Limit In order to calculate the response due to noise, we examine the response due to a single edge element. We assume a surface and calculate each algorithm’s response. For patch of 1 somewhere and, thus SNO, the patch will have For HT, a single patch will have and, thus In order to make scores comparable with in a constant, we and present results by normalizing all SNO scores by all HT scores by . ACKNOWLEDGMENT The authors would like to thank Dr. D. Naidich, Dr. P. S. Desmond, Dr. A. Pineda, and the members of the 3-D Medical Imaging Laboratory in the Department of Radiology at Stanford University for helpful discussions. REFERENCES [1] J. D. Potter, M. L. Slattery, R. M. Bostick, and S. M. Gapstur, “Colon cancer: A review of the epidemiology,” Epidemiologic Rev., vol. 15, pp. 499–545, 1993. [2] S. J. Winawer, A. G. Zauber, M. N. Ho, M. J. O’Brien, L. S. Gottlieb, S. S. Sternberg, J. D. Waye, M. Schapiro, J. H. Bond, and J. F. Panish, “Prevention of colorectal cancer by colonoscopic polypectomy. The national polyp study workgroup,” New Eng. J. Med., vol. 329, pp. 1977–1981, 1993. [3] G. M. Strauss and L. Dominioni, “Perception, paradox, paradigm: Alice in the wonderland of lung cancer prevention and early detection,” Cancer, vol. 89, pp. 2422–2431, 2000. [4] T. L. Petty, “Screening strategies for early detection of lung cancer: The time is now,” JAMA, vol. 284, pp. 1977–1980, 2000. [5] M. L. Giger, K. T. Bae, and H. MacMahon, “Computerized detection of pulmonary nodules in computed tomography images,” Investigat. Radiol., vol. 29, pp. 459–465, 1994. [6] S. G. Armato 3rd, M. L. Giger, C. J. Moran, J. T. Blackburn, K. Doi, and H. MacMahon, “Computerized detection of pulmonary nodules on CT scans,” Radiographics, vol. 19, pp. 1303–1311, 1999. [7] S. G. Armato 3rd, M. L. Giger, and H. MacMahon, “Automated detection of lung nodules in CT scans: Preliminary results,” Med. Phys., vol. 28, pp. 1552–1561, 2001. [8] M. S. Brown, M. F. McNitt-Gray, J. G. Goldin, R. D. Suh, J. W. Sayre, and D. R. Aberle, “Patient-specific models for lung nodule detection and surveillance in CT images,” IEEE Trans. Med. Imag., vol. 20, pp. 1242–1250, Dec. 2001. [9] Y. Lee, T. Hara, H. Fujita, S. Itoh, and T. Ishigaki, “Automated detection of pulmonary nodules in helical CT images based on an improved template-matching technique,” IEEE Trans. Med. Imag., vol. 20, pp. 595–604, July 2001. [10] S. G. Erberich, K. Song, H. Arakawa, H. K. Huang, W. Richard, K. S. Hoo, and B. W. Loo, “Knowledge-based lung nodule detection from helical CT [abstract],” Radiology, vol. 205P, p. 617, 1997. [11] D. J. Vining, Y. Ge, D. K. Ahn, and D. R. Stelts, “Virtual colonoscopy with computer-assisted polyp detection,” in Computer-Aided Diagnosis in Medical Imaging, K. Doi, H. MacMahon, M. L. Giger, and K. R. Hoffman, Eds. Amsterdam, The Netherlands: Elsevier Science B.V., 1999, pp. 445–452. [12] R. M. Summers, C. F. Beaulieu, L. M. Pusanik, J. D. Malley, R. B. Jeffrey Jr., D. I. Glazer, and S. Napel, “Automated polyp detector for CT colonography: Feasibility study,” Radiology, vol. 216, pp. 284–290, 2000. [13] R. M. Summers, C. D. Johnson, L. M. Pusanik, J. D. Malley, A. M. Youssef, and J. E. Reed, “Automated polyp detection at CT colonography: Feasibility assessment in a human population,” Radiology, vol. 219, pp. 51–59, 2001. [14] H. Yoshida and J. Nappi, “Three-dimensional computer-aided diagnosis scheme for detection of colonic polyps,” IEEE Trans. Med. Imag., vol. 20, pp. 1261–1274, Dec. 2001. [15] H. Yoshida, Y. Masutani, P. MacEneaney, D. T. Rubin, and A. H. Dachman, “Computerized detection of colonic polyps at CT colonography on the basis of volumetric features: Pilot study,” Radiology, vol. 222, pp. 327–336, 2002. [16] J. Nappi and H. Yoshida, “Automated detection of polyps with CT colonography: Evaluation of volumetric features for reduction of false-positive findings,” Academic Radiol., vol. 9, pp. 386–397, 2002. [17] G. Kiss, J. V. Cleynenbreugel, M. Thomeer, P. Suetens, and G. Marchal, “Computer-aided diagnosis in virtual colonography via combination of surface normal and sphere fitting methods,” European Radiol., vol. 12, pp. 77–81, 2002. [18] S. B. Gokturk, C. Tomasi, B. Acar, C. F. Beaulieu, D. S. Paik, R. B. Jeffrey Jr., J. Yee, and S. Napel, “A statistical 3-D pattern processing method for computer-aided detection of polyps in CT colonography,” IEEE Trans. Med. Imag., vol. 20, pp. 1251–1260, Dec. 2001. [19] B. Acar, C. F. Beaulieu, S. B. Gokturk, C. Tomasi, D. S. Paik, R. B. Jeffrey Jr., J. Yee, and S. Napel, “Edge displacement field-based classification for improved detection of polyps in CT colonography,” IEEE Trans. Med. Imag., vol. 21, pp. 1461–1467, Dec. 2002, to be published. [20] D. S. Paik, C. F. Beaulieu, R. B. Jeffrey Jr., G. D. Rubin, and S. A. Napel, “Detection of polyps in CT colonography: A comparison of a computer aided detection algorithm to 3D visualization methods [abstract],” Radiology, vol. 213P, p. 197, 1999. [21] J. Canny, “A computational approach to edge detection,” IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-8, pp. 679–698, 1986. [22] C. R. Crawford, private communication, 1998. [23] R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in Proc. 14th Int. Joint Conf. Artificial Intelligence, 1995, pp. 1137–1145. [24] C. E. Metz, “Evaluation of CAD methods,” in Computer-Aided Diagnosis in Medical Imaging, K. Doi, H. MacMahon, M. L. Giger, and K. R. Hoffman, Eds. Amsterdam, The Netherlands: Elsevier Science B.V., 1999, pp. 543–554. [25] P. V. C. Hough, “Methods and Means for Recognizing Complex Patterns,” U.S. Patent 3 069 654, 1962. [26] “Long-term results of lung metastasectomy: Prognostic analyzes based on 5206 cases. The international registry of lung metastases,” J. Thoracic Cardiovasc. Surg., vol. 113, pp. 37–49, 1997. [27] V. W. Rusch, “Pulmonary metastasectomy. Current indications,” Chest, vol. 107, pp. 322–331, 1995. [28] D. P. Chakraborty, “Maximum likelihood analysis of free-response receiver operating characteristic (FROC) data,” Med. Phys., vol. 16, pp. 561–568, 1989. [29] Y. Jiang, C. E. Metz, and R. M. Nishikawa, “A receiver operating characteristic partial area index for highly sensitive diagnostic tests,” Radiology, vol. 201, pp. 745–750, 1996. [30] M. F. McNitt-Gray, E. M. Hart, N. Wyckoff, J. W. Sayre, J. G. Goldin, and D. R. Aberle, “A pattern classification approach to characterizing solitary pulmonary nodules imaged on high resolution CT: Preliminary results,” Med. Phys., vol. 26, pp. 880–888, 1999.