TeraTomo project: a fully 3D GPU based reconstruction code for exploiting the imaging capability of the NanoPET™/CT system M. Magdics 1), L. Szirmay-Kalos1), Á. Szlavecz1), G. Hesz1), B. Benyó1), Á. Cserkaszky2), J. Lantos2), D. Légrády2), Sz. Czifrus2), A. Wirth3), B. Kári3), G. Patay4), D. Völgyes4), T. Bükki4), P. Major4), G. Németh4), B. Domonkos4) 1) Department of Informatics, Budapest University of Technology and Economics, Hungary 2) Institute of Nuclear Techniques, Budapest University of Technology and Economics, Hungary 3) Department of Radiology, Semmelweis University of Budapest, Hungary 4) Mediso Ltd., Budapest, Hungary The TeraTomo project is dedicated to the development of a fully 3D iterative reconstruction code for multi-modality (PET/SPECT/CT) imaging. Recently, we have employed the EM/OSEM scheme for reconstruction of PET images; we have decided to focus on the on-the-fly calculation of the system matrix elements as precisely as possible taking the following physical effects into account: 3D geometry, detector response, positron-range attenuation, and scatter in the medium. The reconstruction algorithms have been tailored to the massively parallel GPU platform (using CUDA technology), enabling to execute the code in parallel on multiple graphics cards [1]. The reconstruction algorithm employs Monte Carlo (MC) techniques for sampling the Lines Of Responses (LOR) and voxels in forward- and backprojection steps. To attack the ill-posed EM scheme, our implementation contains regularization techniques like Gaussian and median filtering, as well as Total Variation regularization that significantly increases the quality of reconstructions at a negligible additional cost. With these advancements, over 2563 resolution voxel arrays can be reconstructed in a few minutes. Verification At the recent stage of TeraTomo development, we have already successfully implemented and carefully verified the 3D geometry based reconstruction engine including detector response modeling. Mathematical phantoms TeraTomo Tomography reconstruction Source estimation Compare Compute expected detector response System overview The reconstruction algorithm is an iterative maximum likelihood estimation method (EM/OSEM), which alternatively executes photon transport simulation (forward projection) and source correction (backprojection). We implemented two types of simulation approaches, both running on multiple GPUs: 1.) Using Monte Carlo particle transport simulation (MC) [2] and 2.) Using adjoint Monte Carlo approximation (adjMC) [3]. Reconstructed voxels Expected detector response Measured LORs yL ^x V xV(n) ~ yL Forward projection Filtering Phantom Backprojection + TV regularization GATE (n+1) xV sV * xV(n) Monte Carlo particle transport (MC) In both forward projection and backprojection, MC assigns particles to GPU computing threads. The particle transport is simulated by each thread closely mimicking nature by sampling each possible interaction with the corresponding probability distributions as long as the particles are in the object (phantom). Particles initiated at a voxel would hit a detector with a given probability, otherwise miss; by the latter the computing effort spent on a particle is lost. To minimize this price we pay for the simulation accuracy, non-analog MC techniques are used such as source direction biasing, implicit capture, biased source sampling and precomputed detector response. Free path sampling is achieved by Woodcock tracking, thereby the simulation efficiency is only loosely dependent on material constituents. Each particle history contains but a few events from positron annihilation to escape or to energy cut-off, hence computing threads hardly diverge. Overall simulation speed offers sampling up to 5108 positrons/sec, roughly 2.5 magnitudes over a general purpose MC particle transport code. Advantages of faithful physics simulation are expected at media with high scattering components in exchange for slower convergence and more pronounced tendency for noise build-up. Forward projection and backprojection employ the same MC engine for calculating system matrix elements. Optimum samplings of voxels in both reconstruction phases were found to be proportional to the activity. 4D detector response image Source correction In order to validate our system, we used both simulated and measured data. Mathematical phantoms were simulated by GATE [4]. Both simulated and measured list-mode data were binned into LOR files before reconstruction. Then, taking the simulated detector hits, we reconstructed the source distribution with our program, and compared the result with the original phantom. In case of measured data, the normalization information was passed to the TeraTomo reconstruction engine. PET equipment: NanoPET™/CT The NanoPET™/CT is an ultra-high resolution, high sensitivity pre-clinical PET-CT system using the most advanced, commercially available components, i.e. an 18 cm diameter PET-detector polygon with 12 detector modules, each consisting of 81×39 LYSO crystals (1.12×1.12×13 mm³) tightly packed and coupled to two 256-channel PS-PMTs. The imaging capability of this system can only be exploited by using a fully 3D reconstruction algorithm modeling the detector response, positron range, gamma attenuation, and scatter effects. Speed of reconstruction Reconstructing the Derenzo phantom using 144×144×128 voxels, assuming the NanoPET™/CT PET detector geometry [5] that contains 180 million LORs, the speed of reconstruction is similar to that of FORE rebinning and 2DEM running on multiple CPUs. The TeraTomo reconstruction times on a single Nvidia Fermi GPU card are shown in the following table. When two GPU cards are enabled, running time is halved. Forward Back 3D-MC 28 sec/iteration 28 sec/iteration 3D-adjMC 2 sec/iteration 16 sec/iteration Effect of Total Variation regularization TeraTomo reconstruction of the rotated Derenzo phantom, rod sizes are 1.0, 1.1, 1.2, 1.3, 1.4, and 1.5 mm. GATE simulation in NanoPET™/CT PET detector geometry was reconstructed into 144x144x128 voxels volume. The central sagittal slice of the reconstructed volume at 50 Iterations is depicted below Adjoint Monte Carlo approximation (adjMC) The adjMC method employs approximate adjoint transfer in forward projection and a geometric backprojection, assigning LORs to threads in forward projection and voxels in backprojection. In order to increase the accuracy of integral quadratures, we use quasi-Monte Carlo techniques combined with stochastic iteration and filtered sampling. Forward projection of the adjMC method Direct Direct: Geometry voxels Positron range Detector model Combination+ Stochastic iteration Indirect Scatter + Attenuation + Detector model Expected LORs Noise tolerance Regularization methods guarantee correct reconstruction even for low-dose measurements when the average number of hits per LOR is very low. The left image depicts a Derenzo that was reconstructed from 4 hits per LOR in average, the right image shows the reconstruction from only 0.2 hits per LOR in average. The forward projector of the adjoint method examines LORs one-by-one and computes the expected number of hits due to annihilations in voxels intersected by the line samples of this LOR (direct contribution) and all voxels (scattered contribution). Scattering in the object and scattering in the detectors are handled independently via the interface of sample points generated on the surface of the detector crystal. The high-dimensional integral of a LOR is estimated by quasi-Monte Carlo quadrature including Poisson-disk sampling. In order to reduce the variance of this estimator, we employ both spatial and temporal filtering. Temporal filtering is called stochastic iteration, which averages the estimated LOR values with the results of previous steps. Spatial filtering may include either Gaussian or median filtering before the execution of the forward projector. Scatter compensation of adjMC method Results of incorporating detector response modeling s2 s2 z1 s1 s1 s2 z1 z2 3D reconstruction of the GATE simulated Derenzo phantom. The effects of 3D geometry and of detector response modeling in the reconstruction have significantly improved the image quality. s1 1. Scattering points 2. Ray marching from 3. Combination of paths detector to scattering points The sampling process estimating the scattered contribution has three steps. First, scattering points are sampled with a probability density that is proportional to the scattering cross section of the material. Then, the total annihilation and out-scattering is computed between these sample points and the detector crystals. In the final step, we just combine these results together and compute the direct component. As the number of crystal pairs is much larger than the number of crystals and scattering points, the scattering computation has just a small overhead with respect to that of the direct component. 2D reconstruction: SSRB + OSEM 3D adjMC reconstruction without detector response compensation 3D adjMC reconstruction with detector response compensation Results of measured data reconstruction Detector response modeling q Photons may get scattered in detector crystals before they get finally absorbed. Unlike the measured object, which is different in each measurement, the detectors are fixed, so the probabilities of photon paths between detector crystals can be pre-computed, and these pre-computed data can be included in the estimator. During the precomputation we consider a single input crystal and incident photons coming from a direction of given inclination and azymuthal angles are simulated and we compute the probabilities that this photon is absorbed in another crystal. These probabilities can be visualized as a two dimensional image, where arrival probabilities are depicted by gray levels. These images are two large to be sampled efficiently. So, during the pre-computation, we pre-generate quasiMonte Carlo sample sets that contain just a few samples, but their cumulative distribution is as close as the simulated distribution as possible. Backprojection of the adjMC method Detector 1 Detector 2 voxel z1 v z2 The backprojector of the adjMC method has been developed with the objective of efficient GPU execution. Thus, it is also of gathering type where a thread computes the update of a voxel from all LORs intersecting it. To select these LORs, a point is sampled in the voxel, then the detector module is centrally projected onto its coincidence pair via the voxel sample. The backprojector is also responsible for TV regularization, thus it also reads the neighboring voxel values of the previous step, obtains the gradient and includes it in the iteration formula. 3D EM reconstruction of the F-18 mouse bone PET study, taken by NanoPET™/CT. There was no scatter, attenuation, and detector model compensation. Future work We are going to deal with the verification of scatter and attenuation modeling in the near future. Then we are planning to implement random coincidence modeling as well as dead time correction in order to archive a 3D quantitative reconstruction tool. References [1] DOMONKOS, B., AND JAKAB, B. A Programming Model for GPU-based Parallel Computing with Scalability and Abstraction. In Spring Conference on Computer Graphics (2009), pp. 115–122. [2] WIRTH, A., CSERKASZKY, A., KÁRI, B., LÉGRÁDY, D., FEHÉR, S., CZIFRUS, S., AND DOMONKOS, B. Implementation of 3D Monte Carlo PET Reconstruction Algorithm on GPU. In IEEE Medical Imaging Conference (2009) [3] SZIRMAY-KALOS, L., TÓTH, B., MAGDICS, M., LÉGRÁDY, D., AND PENZOV, A. Gamma Photon Transport on the GPU for PET. Lecture Notes in Computer Science 5910: pp. 433-440. (2010) [4] GATE, see http://opengatecollaboration.healthgrid.org/, and JAN S. et. al., GATE: a simulation toolkit for PET and SPECT. Phys. Med. Biol. 49 (2004) 4543-4561 [5] More information about NanoPET™/CT device, visit: http://www.bioscan.com/molecular-imaging/nanopet-ct