EXTENDED ABSTRACT Introduction In this work a method for liver metastases segmentation in FDG-PET is presented, which addresses lesions for pretreatment planning and the post-treatment necrotic area for the evaluation of treatment outcomes. This issue is currently of great incidence because of the impact of liver cancer and because of the growing use of imaging techniques in the field of cancer treatment. Automatic segmentation tools aim at more rapid and operator independent results, though they have to gain robustness and accuracy against the poor quality of PET images. The considered treatment technique is image-guided tumor ablation, which is a minimally invasive strategy to treat focal tumors by introducing irreversible cellular injury through application of thermal and, more recently, nonthermal energy or chemical injection. This approach has become a widely accepted technique in a range of clinical applications, including treatment of focal tumors in the liver, lung, kidney, bone and adrenal glands. There are several methods: radiofrequency, microwaves, laser, high-intensity focused ultrasound (HIFU), cryoablation. Benefits of minimally invasive therapies compared to surgical resection include lesser mortality and morbidity, lower cost and the ability to perform procedures on patients who are not good candidates for surgery. Among imaging methods, the most used for liver cancer detection is PET / CT, which allows to obtain both functional and anatomical information of the patient. In fact 18F-FDG-PET, using a radioactive glucose tracer, is able to characterize the glucose consumption, which is higher in the metastases than in other tissues and virtually null in the necrotic tissue after treatment. The main problem of PET imaging is the low spatial resolution, so to overcome this limit, the CT imaging, with its higher spatial resolution, is used to insert the hotspots identified by PET into the correct anatomical position. The present work introduces a segmentation method based on the statistical Gaussian Mixture Model (GMM) clustering, capable to deal with the noisy liver background. A correction based on Markov Random Field (MRF) theory introduces a prior in the iterative voxel classification, based on the previous classification of neighbors, thus favoring the segmentation of connected sets. In addition, CT information is exploited for pre- to post-treatment registration, in order to compare necrotic volume with the original lesion. Validation on phantom simulations and on a set of 12 patients (21 lesions) is performed by comparison between other segmentation methods and the ground-truth or manual contours respectively for simulated data and clinical data set. Methods In literature there are several segmentation techniques that grossly can be grouped in manual, semi-automatic and automatic. Among the automatic techniques there are region-based algorithms, e.g. thresholding and region-growing and edge-based algorithms, thus ones based on derivative filters. Other more complex techniques include K-means, GMM and MRF Model. In the specific case of tumor liver segmentation, algorithms are grouped based on the type of image used: CT, PET or joint PET/CT information. Algorithms that work on CT images include fast discrete curvelet transform, knowledge based constraints and support vector classification with watershed. On the other hand, algorithms that work on PET images involve threshold (42% of maximum activity concentration), modified threshold, cluster analysis. PET / CT information based algorithms use tumors’ shape and appearance and fuzzy MRF Model. Conversely, to our knowledge, the problem of post-treatment segmentation was less investigated, so far. In this work, a preliminary liver segmentation step was performed in order to limit the working area. There are several algorithms that try to solve the problem of liver segmentation in PET / CT images. The first group of algorithms carries out the segmentation from CT images only, the second group from PET images and the third merges information from both PET and CT. In this case, the liver segmentation process was carried out on CT images, because of their higher spatial resolution, next the segmented region was superimposed on PET images, thanks to the inherent coregistration provided by hybrid PET / CT scanner. The proposed algorithm for liver segmentation was structured as follow. First a coherence filter was applied: Weickerts equation with a plane like kernel was used to remove noise. After that, the image was sharpened and a region-growing algorithm was applied in order to obtain the initial segmentation. The resulting volume still contained portions of surrounding organs and thus further refinement was needed. Elimination of the unwanted structures was based on the dimension of the area segmented in each slice of the volume. Starting from the caudal liver slice, the two largest areas in each slice were identified and the Dice Index was carried out with the liver area in the previous slice to choose the area with highest overlap in the new slice. In parallel a gross liver segmentation was carried out in PET by thresholding and finally the volume was restricted to that satisfying both segmentations. The effect of treatment was evaluated by coregistering the pre-treatment volume with the hot lesions to the posttreatment space where they were substituted by the cold necrotic tissue. A sequence of linear and elastic B-spline registration was needed. It was based on CT grey levels and applied to PET as well. Twelve patients gave informed consent and were enrolled in the present study, for which FDG-PET and CT have been acquired from a PET / CT hybrid scanner (Biograph 6 True Point Siemens Medical Solutions, Knoxville, TN, USA) previous to treatment and 24 hours post-treatment. Pre-treatment processing algorithm consisted of two steps: background characterization and tumoral slices processing. Background characterization was based on a set of PET slices free of detectable lesions. The statistics of healthy liver intensity levels was modeled by a 4 classes standard GMM thus saving 4 Gaussian distribution mean values for the subsequent segmentation step. Lesion segmentation addressed a slab of a limited number of slices affected by tumor increasing the GMM classes from the 4 of background to 8. Initialization has involved a standard GMM step (no neighborhood prior) with the background classes fixed at the previously found mean values and the other ones evolving at higher values for the inclusion of hot lesional voxels. Next iterations were carried out including the neighborhood prior in a hard and soft version that differ on the type of information used. At convergence, the final class assignment (i.e. segmentation) was done upon the highest probability. In post-treatment a similar background characterization by a 4 class standard GMM was performed. However, a description of cold necrotic tissue by further classes gave poor results; so, background characterization was used to fix the low threshold for the cold treated tissue. Validation was performed both on digital phantom spherical lesions vs. the known ground-truth and in patients vs. an expert’s manual. The similarity of estimates vs. their reference (i.e., ground-truth and manual gold standard, in simulations and real data, respectively) was evaluated by: 1) volume ratio; 2) Dice index; 3) Hausdorff distance. In pretreatment, comparisons between the hard and soft algorithms and the classical 42% maximum thresholding (TH42), classical GMM and K-means clustering were performed. In post-treatment the GMM driven thresholding (THGMM) was compared with TH42 only, due to the poor results by any clustering method. Results In simulations the pre-treatment algorithms showed a better performance than the automatic 42% threshold algorithm (TH42) in both hard and soft algorithm versions, regardless the ratio between phantom sphere intensity and phantom background intensity (1.432±0.211 hard, 1.021±0.042 soft, 3.523±0.932 TH42), and the phantom sphere radius (1.121±0.023 hard, 1.431±0.321 soft, 3.832±0.632 TH42). Phantom’s intensity ratio and radius were set to values similar to clinical ones. In terms of Dice coefficient, the proposed hard and soft algorithms showed a higher Dice coefficient than the 42% threshold algorithm (TH42) both varying the phantom sphere radius (0.912±0.042 hard, 0.923±0.053 soft, 0.812±0.065 TH42) and intensity ratio (0.965±0.012 hard, 0.953±0.013 soft, 0.791±0.052 TH42). Post-treatment phantom results showed, for the proposed algorithm GMM threshold (THGMM), a volume ratio, as function of phantom background intensity and phantom sphere intensity ratio, closer to 1 in respect to 42% threshold (1.045±0.095 THGMM, 1.421±0.021 TH42). This happened, roughly, when volume ratio changed as function of phantom sphere radius too (1.121±0.342 THGMM, 1.254±0.342 TH42). Considering also the Dice coefficient, proposed algorithm results guarantee higher coefficient values that the threshold algorithm both varying the intensity ratio (0.955±0.001 THGMM, 0.945±0.013 TH42) and the sphere radius (0.946±0.013 THGMM, 0.935±0.015 TH42). In clinical data the pre-treatment algorithm encompassed the usual threshold setting at 42% maximum (TH42) and also standard GMM and K-means as to the similarity with manual contouring by an expert in terms of volume estimate / gold standard (1.005±0.097 hard, 1.045±0.066 soft, 1.931±1.328 TH42, 0.324±0.168 GMM, 2.671±2.52 K-means), Dice (0.984±0.011 hard, 0.987±0.006 soft, 0.953±0.059 TH42, 0.678±0.021 GMM, 0.917±0.102 K-means), and Hausdorff distance (0.789±0.148 hard, 0.655±0.195 soft, 1,069±0.421 TH42, 0.789±0.148 GMM, 1.346±0.578 Kmeans). Post-ablation results by the proposed GMM threshold (THGMM) were also more similar to the manual gold standard than TH42 in terms of volume ratio (1.026±0.079 THGMM, 2.475±1.139 TH42), Dice (0.966±0.011 THGMM, 0.862±0.062 TH42), and Hausdorff distance (1.336±0.323 THGMM, 1.884±0.304 TH42). Conclusions Simulations permitted to reliably demonstrate the accuracy of the proposed algorithms both for the hot objects pretreatment detection and the cold area post-treatment segmentation. Robustness was shown both by the low sensitivity to the neighborhood weight parameter over a wide range and also to the close results given by the hard and the soft approaches. In the more difficult clinical environment, reference could be given only by a manual contouring taken as gold standard. Nonetheless, in all patients the proposed algorithms encompassed the compared one thus fostering further validation on wider data-sets, foreseeing a consistent outcome to clinical practice both in the intervention planning phase and in the post-treatment assessment.