BI-PLANAR IMAGE REGISTRATION AND MODELING OF BONES by Joshua Boyd King A thesis submitted to the Faculty and the Board of Trustees of the Colorado School of Mines in partial fulfillment of the requirements for the degree of Master of Science (Engineering Systems). Golden, Colorado Date Signed: Joshua Boyd King Approved: Dr. William Hoff Professor of Engineering Thesis Advisor Golden, Colorado Date Dr. Terry Parker Professor and Head Department of Engineering ii ABSTRACT To apply computer aided surgery to bone related procedures a 3D model of the bone needs to be preoperatively produced uniquely for each patient. This can be accomplished very accurately by performing a CT scan of the patient and segmenting the data to create a surface model, which would then need to be rigidly registered to the patient during surgery. Unfortunately building a model in this manner is undesirable because it is time consuming, costly and exposes the patient to large amounts of X-rays. To minimize cost, time and X-ray exposure it is desirable to use a deformable 3D model that can be altered during the registration process to match any patient. The use of a deformable model directly couples the modeling and registration processes together, resulting in a 6 (rotation and translation) + N degree of freedom optimization problem, where N is the number of deformable shape parameters. Dr. Mahfouz and his colleagues at the University of Tennessee in Knoxville have developed a registration method that uses a statistical bone atlas to estimate patient specific femur models from two digitally reconstructed radiographs (synthetic X-ray images) of the right femur. The objective of this work is to improve the speed and accuracy of the method; specifically by modifying the existing evaluation function and using a new search strategy. These improvements are then applied to the registration and modeling problem for the right femur and L5 lumbar vertebra. iii TABLE OF CONTENTS ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii ACKNOWLEDGMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Chapter 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 2 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . Similarity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . Statistical Atlas Models . . . . . . . . . . . . . . . . . . . . . . . . . 3 4 6 2.1 2.2 2.3 Chapter 3 3.1 3.2 3.3 IMPROVEMENTS TO EVALUATION FUNCTION . . . . . . . 7 Image Gradients and the Effects of Blurring . . . . . . . . . . . . . . Edge Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . Single Point Contribution . . . . . . . . . . . . . . . . . . . . . . . . 7 10 12 Chapter 4 4.1 4.2 . . . . . . . . . . . 14 Cross-Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Two Stage Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 14 16 Chapter 5 5.1 5.2 5.3 . . . . . . . . . . . . . . . . . . 18 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . Digitally Reconstructed Radiographs . . . . . . . . . . . . . . . . . . Virtual Reality Toolbox and Camera Calibration . . . . . . . . . . . . 18 19 20 Chapter 6 6.1 6.2 IMPROVEMENTS TO SEARCH METHOD IMPLEMENTATION DETAILS EVALUATION AND TESTING . . . . . . . . . . . . . . . . . . 22 Improvements to Evaluation Function . . . . . . . . . . . . . . . . . . Improvements to Search Method . . . . . . . . . . . . . . . . . . . . . 22 25 iv 6.3 Experimental Results of the Full Algorithm . . . . . . . . . . . . . . 6.3.1 Femur Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 L5 Lumbar Vertebra Results . . . . . . . . . . . . . . . . . . . Chapter 7 27 27 31 SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Major Results and Conclusions . . . . . . . . . . . . . . . . . . . . . Recommendations for Future Work . . . . . . . . . . . . . . . . . . . 34 35 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 7.1 7.2 v LIST OF FIGURES 2.1 The (a) predicted region , (b) edge and (c) smoothed edge images. . . 5 2.2 Femur atlas model example modes of variation . . . . . . . . . . . . . 6 3.1 X-ray intensity profiles and corresponding derivatives with (dashed) and without (solid) gaussian blurring. . . . . . . . . . . . . . . . . . . 8 Shift in the magnitude of the image gradient at different bone radii and gaussian blurs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.3 Example of morphological erosion . . . . . . . . . . . . . . . . . . . . 10 3.4 The (a) lumbar DRR and (b) corresponding gradient image. . . . . . 11 3.5 Canny edge detector applied to a lumbar DRR. . . . . . . . . . . . . 11 3.6 The true edge near erroneous data points where the predicted edge is spread. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 3.7 The true edge near erroneous data points where the X-ray edge is spread. 13 4.1 Rays passing through the image planes of both camera views. . . . . 15 4.2 Flow chart of the two stage optimization strategy. . . . . . . . . . . . 17 5.1 Casting rays through a CT volume . . . . . . . . . . . . . . . . . . . 19 6.1 The plot of mean errors and the corresponding 90% confidence intervals (error bars) for the four test cases used for the femur rigid optimization. 23 6.2 The plot of mean errors and the corresponding 90% confidence intervals (error bars) for the four test cases used for the lumbar rigid optimization. 24 6.3 Comparison of the average magnitude errors of the femur with the standard search method in blue and the cross-correlation search method in green. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi 25 6.4 Comparison of the average magnitude errors of the lumbar with the standard search method in blue and the cross-correlation search method in green. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 6.5 Typical starting pose for the right femur. . . . . . . . . . . . . . . . . 29 6.6 Typical optimized shape for the right femur at the ICP registered rigid pose. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 6.7 Typical starting pose for the L5 lumbar vertebra. . . . . . . . . . . . 32 6.8 Typical optimized shape for the L5 lumbar vertebra at the ICP registered rigid pose. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Lumbar vertebra rendered partially transparent. . . . . . . . . . . . . 36 7.1 vii LIST OF TABLES 6.1 Mean errors and 90% confidence intervals for the four test cases used for the femur rigid optimization. . . . . . . . . . . . . . . . . . . . . . 23 Mean errors and 90% confidence intervals for the four test cases used for the L5 lumbar rigid optimization. . . . . . . . . . . . . . . . . . . 24 6.3 Optimized femur shape point to surface errors. . . . . . . . . . . . . . 28 6.4 Analysis of optimized femur shape point to surface errors . . . . . . . 28 6.5 Optimized lumbar shape point to surface errors. . . . . . . . . . . . . 31 6.2 viii ACKNOWLEDGMENT I would like to thank Dr. Mohamed Mahfouz from the University of Tennessee in Knoxville for the opportunity to work on this project and for supporting my research. ix 1 Chapter 1 INTRODUCTION The main focus of orthopedic medicine is the prevention and correction of injuries or deformities of the skeletal system and its associated muscles and joints. In orthopedic surgeries, it is important to accurately cut or drill bones. Computer aided surgery (CAS) technology has the potential to greatly improve the accuracy of such surgeries, and thus improving the overall safety and recovery time of the patients. Most CAS technologies need a 3D model of the bone [1]. This is because they use sensor data (eg, fluoro images) and register them to a 3D model of the bone, and then estimate the pose of the patient with respect to the sensor. The standard method of producing the 3D model is to perform a preoperative CAT scan of the patient and manually segment the 3D volume data. This process is not only costly, but is time consuming and imposes a lot of radiation hazard to the patient. An alternative is to create the 3D models using a small number (eg, 2) of still xrays. This approach would be faster, cheaper, and expose the patient to less radiation. The problem is that there is not enough information to uniquely reconstruct a 3D model from two x-ray projections alone. One way to deal with this ambiguity is to use a priori knowledge of the shape of the bones. This knowledge can be captured in the form of a deformable statistical atlas, created from examples of many bones. The use of such a deformable model directly couples the modeling and registration processes together, resulting in a 6 (rotation and translation) + N degree of freedom optimization problem, where N is the number of deformable shape parameters. 2 A group at the University of Tennessee [2], has developed a registration method that uses a statistical bone atlas to estimate patient specific femur models from two digitally reconstructed radiographs (synthetic X-ray images) of the right femur. The objective of this work is to improve the speed and accuracy of this method; specifically by modifying the existing evaluation function and using a new search strategy. These improvements are then applied to the registration and modeling problem for the right femur and L5 lumbar vertebra. The remainder of this thesis is organized as follows: Chapter 2 covers the relevant background materials and previous work; Chapters 3 and 4 discusses the modifications to the existing Mahfouz method and their justifications; Chapter 5 covers the implementation details of the genetic algorithm, digitally reconstructed radiographs and the virtual camera calibration; Chapter 6 presents the evaluation and experimental results of the modifications discussed in Chapters 3 and 4; and finally Chapter 7 is the summary of this work and the recommendations for future work. 3 Chapter 2 BACKGROUND 2.1 Image Registration The objective of image registration is to use one or more 2D images of an object to find the rigid transformation from the corresponding 3D model in the computer frame to the world frame of reference. Given a 3D model of the object and a model of the imaging process a 2D image of the model can be predicted at a given rigid pose and compared to the image of the actual object. Specifically for this research the 3D models are of bones (right femur or L5 lumbar) which are registered to X-ray images. Directly comparing the predicted images to the X-ray images is considered an intensity based method where the predicted images can vary from from silhouettes [3] of a 3D model to digitally reconstructed radiographs [4] computed from CT data. The alternative to intensity based methods are feature based methods, such as [5] and [6]. Feature based methods rely on the extraction/segmentation of landmarks or other features such as contours and silhouettes from the X-ray images that are rigidly aligned to the same features on the 3D model. The segmentation can be performed either manually or automatically and once complete the method tends to be faster than other methods. However, the segmentation process can be time consuming and prone to error [6]. When using only one image in the registration process it is difficult to find the depth of the object from the camera accurately without a secondary device, such as a 4 range sensor. Bi-planar registration can more easily establish the depth of the object from the first camera using the second camera and vice versa. 2.2 Similarity Measures The similarity measures described below were originally developed by Dr. Mah- fouz and his colleagues to register 3D models of knee implants to single plane fluoroscope [3] images of patients with knee implants. The evaluation function is based on an edge and region similarity measure and was optimized using simulated annealing. This method has since been applied to bi-planar image registration [2], where a statistical bone atlas is registered to two approximately orthogonal synthetic Xray images. The bi-planar case used a coarse to fine optimization strategy with a genetic algorithm to optimize the evaluation function for both the rigid and shape parameters. There are two types of predicted images that are used in the region and edge similarity measure. The region image (Figure 2.1(a)) is the binary silhouette of the projected 3D models and its corresponding smoothed edge image (Figure 2.1(c)), which is created by running the silhouette through an edge detector (Figure 2.1(b)) and then filtering the result with an N by N gaussian mask. The edge is smoothed to allow points near the edge to contribute to the score by an amount that decreases with increasing distance from the edge. Edge points beyond N pixels from the edge are given zero weight, where N is chosen to allow for some degree of modeling error, but minimized to reject erroneous edge points. The region score (Equation 2.1) for each individual view is computed by multiplying the X-ray image (Ij ) and predicted region image (Pj ) together, summing the result and normalizing by the sum of the predicted image. Here, j = 1, 2 for the two 5 Figure 2.1. The (a) predicted region , (b) edge and (c) smoothed edge images. views. P P Ij (u, v)Pj (u, v) Rj = u P v P u v Pj (u, v) (2.1) The edge score (Equation 2.2) is very similar to the region score with the exception of using the edge enhanced (gradient magnitude) X-ray image (Iej ) and the predicted smoothed edge image (P ej ). P P Iej (u, v)P ej (u, v) Ej = u Pv P u v P ej (u, v) (2.2) Equations 2.1 and 2.2 are basically the average intensities of the overlapped regions and edges of the X-rays and predicted images. Since it is assumed that the bone in the input X-ray image will have a higher intensity than the surrounding soft tissue and also have a strong edge, both functions should be at a maximum when the predicted and X-ray images are correctly aligned. The total score for each image is a weighted combination of both the region and edge scores; Sj = αRj + βEj (2.3) where the edge score is given a higher weight so that it dominates the score when close to the true solution. The overall score is the summation of the total scores for all j views. This score is then optimized to find the best pose. 6 There are a few disadvantages with the existing method, the first being that strong edges in the gradient magnitude may bias the solution (Chapter 3.2). Second the spreading of the predicted edge allows for multiple edge points to contribute the overall score (Chapter 3.3). Third many function evaluation are needed during the search, for example in [3] up to 10 thousand evaluations were needed (Chapter 4). 2.3 Statistical Atlas Models The statistical atlas models used in [2] and [7] were constructed by applying principal component analysis (PCA) [8] to data sets comprised of surface models segmented from CT. The primary benefit to using PCA is that the dimensionality of a data set can be reduced by only using the principal components that represent the most significant variations from the statistical mean. The femur atlas for example has a total of 188 principal components, where the first nine components represent approximately 99% of variation of the model. An example of the variations of femur atlas is shown in Figure 2.2 for the first three principal components. Figure 2.2. Femur atlas model example modes of variation 7 Chapter 3 IMPROVEMENTS TO EVALUATION FUNCTION 3.1 Image Gradients and the Effects of Blurring In image processing the gradient magnitude of an image (Equation 3.1) is used to highlight abrupt changes in intensities that occur across object boundaries. In the absence of blurring the maximum of gradient image is coincident with the edges of the original image. sµ Igm (u, v) = ∂I ∂u ¶2 µ + ∂I ∂v ¶2 (3.1) Image blurring can occur naturally in the imaging process and is often used artificially to suppress noise. The effect of blurring is that it can shift the maximum of the gradient so that it is not coincident with the true edge locations. This shift depends on the intensity profile of the edge and the amount of blurring present in the image. To show this effect a cross sectional slice through the shaft of the femur is approximated as a solid disk of uniform density and using parallel projection its X-ray intensity profile is estimated using the following equation; To estimate the magnitude of this effect, the bone was modeled as a cylinder of uniform density. We modeled the x-ray intensities received at the image plane using the equation: Ip (x) = Io − Io exp (−µt(x)) (3.2) 8 where Io is the maximum intensity, µ is the X-ray linear attenuation coefficient for bone and t(x) is the thickness of the disk as a function of the distance from its center. To simulate blurring Ip is convolved with a gaussian. The overall process is shown graphically in Figure 3.1, where the maximum of the derivative of the blurred appears to ”shrink” the observed silhouette from its predicted (non-blurred) location. Figure Figure 3.1. X-ray intensity profiles and corresponding derivatives with (dashed) and without (solid) gaussian blurring. 3.2 shows the results of computing the edge location error for multiple disk radii and gaussian blurs. 9 Figure 3.2. Shift in the magnitude of the image gradient at different bone radii and gaussian blurs In general an X-ray image, whether actual or synthetic will have some degree of blurring, which will shift the maximum of its gradient. The predicted image has little to no blurring and therefore the location of the maximum of its gradient is assumed to be coincident with the true edge of the projected 3D model. This difference in blurring results in a misalignment of the maximum of the gradients in each image. The edge shifts shown in Figure 3.2 are approximately constant beyond a certain bone thickness, therefore the shift itself is assumed to be a constant N , depending only on the degree of blurring present in the image. In order to correctly align the predicted edges with the observed edges, the silhouette of the predicted bone is shrunk by N = 2 pixels prior to edge detection. Morphological erosion is used to perform the shrinking of the silhouette because 10 it is a simple and relatively fast method when used on binary images, such as the predicted images. Erosion can be defined as the logical operation; {z|(B)z ⊆ A} (3.3) where the erosion of set A by set B is the set of all points z such that B, translated by z, is contained within A [9]. Figure 3.3 below shows example sets A and B and the resulting erosion of A by B. To shift the edges inward by the two pixels desired Figure 3.3. Example of morphological erosion the erosion operation is carried out twice. 3.2 Edge Segmentation In X-rays of the femur the boundaries of the bone are fairly uniform and of a much higher intensity then the surrounding soft tissue. The same is not true for the Xrays of the spine due to the irregular shape of the vertebrae. As can be seen in Figure 3.4(a) the X-ray of the L5 lumbar vertebra varies greatly in intensity and portions of the bone cannot be easily distinguished from the surrounding soft tissue, which affect 11 intensities of the image gradient, Figure 3.4(b) in the same way. This non-uniformity Figure 3.4. The (a) lumbar DRR and (b) corresponding gradient image. in the gradient can be problematic when using the Mahfouz edge similarity measure (equation 2.2) because strong edges will greatly dominate the score. To eliminate this problem the dim and strong edges need to equally contribute to the score. This can be accomplished by segmenting the gradient and giving it a constant value. A Canny edge detector [10] is used to produce the segmented edge image (Figure 3.5). The Canny was selected because it attempts to detect weak gradients by linking them to strong gradients, thus allowing weak contours to be segmented. Figure 3.5. Canny edge detector applied to a lumbar DRR. 12 3.3 Single Point Contribution The original edge similarity measure spreads the predicted edge image before comparing it with the input X-ray edge image. By spreading the predicted edges there will be multiple points per original predicted edge pixel that can contribute to the overall score. Consider the case of an edge contour near erroneous edge data points shown in Figure 3.6, where the solid vertical line is the correct edge. If the Figure 3.6. The true edge near erroneous data points where the predicted edge is spread. predicted vertical spread edge is evaluated at every possible position the maximum of the score will likely occur between the correct edge and the erroneous data points if they are close enough together. Instead of allowing multiple edge points from the input image to match a single 13 predicted edge point, the matching will be limited to at most one input edge point to each predicted edge point. By creating a distance map from the input edge image it is possible to compute the distances between all the predicted and input edge points (Figure 3.7). If each predicted edge point is within a certain maximum distance (7 pixels in this work) of a input edge point, then that distance is used when computing the contribution. If beyond that distance, no contribution is made. Figure 3.7. The true edge near erroneous data points where the X-ray edge is spread. Once the distance map, D(u, v) is created from the input edge image it is weighted using a Gaussian. The weighted distance map is defined as Dw (u, v) = exp(−D(u, v)2 /2σ 2 ), where σ is 1.7. 14 Chapter 4 IMPROVEMENTS TO SEARCH METHOD 4.1 Cross-Correlation Consider the case of optimizing translation only, assuming all other parameters are known. The Mahfouz method uses a semi-random search over XYZ and calculates the evaluation function (a product and sum) at each guess to the solution. Repeating this process for many guesses is very similar to performing the cross-correlation (XC) between two images, which can be computed fast if implemented in the frequency domain [11]. In image processing XC is used for template matching, where the objective is to find the u, v image location of a template within a larger image. The equation for XC is defined below; XCj (u, v) = XX m Ij (u + m, v + n)Tj (m, n) (4.1) n where Ij (u, v) is the larger image, Tj (m, n) is the template XCj (u, v) is the resulting matrix of correlation coefficients. Incorporating XC into the evaluation function allows it to search over the u and v planar translations relative to each view, making it less dependant on prior knowledge of translation. However there will always be some dependence on translation because the size of the templates are directly related to the relative depth of the model in each view. The modified evaluation function now only needs a hypothesis of rotation and 15 a rough estimate of the translation, whereas the original function needed an exact hypothesis of rotation and translation. The index of the maximum of the correlation image is the u, v location on the image plan where the best match occurred. From this location, the intrinsic camera parameters, and the transform from camera to world coordinates, a direction can be calculated in the world reference frame. Using the origin of the camera as the starting point and the calculated direction, a ray can be cast that passes through the maximum location on the image plane (see Figure 4.1). With the rays from both views the translation of the bone can be estimated by calculating the mid-point of Figure 4.1. Rays passing through the image planes of both camera views. the shortest line segment between the rays. This is the primary benefit and novelty of using XC, because translation can now be calculated, which reduces the number of DOF required in the optimization of the evaluation function by three. The final evaluation function only uses the XC modified edge score. Although XC can be applied to the region score as well it has been dropped to reduce the 16 overall computation time of the evaluation function. Incidently, this also classifies this method as a purely feature based method. 4.2 Two Stage Optimization The optimization strategy used can be broken into two stages (Figure 4.2). The first stage optimizes the evaluation function for the rotation angles, and the 1st principal component of the atlas model using the genetic algorithm. Since the base model may be significantly different than the actual bone the 1st principal component is used to find a rough estimate of the shape and scale of the bone. The second stage optimizes the rotation angles and the N principal components that represent 99% of the atlas model variation. The second stage also uses the genetic algorithm, where the bounds on the parameters optimized in the first stage have been reduced. An intermediate step occurs at the end of each stage where the translation is calculated from the maximum correlation peak locations, using the optimized solutions at each stage in the XC function call. 17 Figure 4.2. Flow chart of the two stage optimization strategy. 18 Chapter 5 IMPLEMENTATION DETAILS All the code and programs used for this research were implemented in Matlab. The deformable atlas models, all code pertaining to the digitally recontracted radiographs and the CT data sets were provided by Dr. Mohamed Mahfouz, Brandon Merkl and Mike Kuhn of the University of Tennessee in Knoxville. 5.1 Genetic Algorithm A genetic algorithm (GA) is a semi random search method that simulates evolu- tion. Given a population of individuals and a fitness function to score each individual the GA uses a selection function where the highest scoring individuals are more likely to be selected to reproduce or mutate (evolve) into the next generation. In terms of optimization an individual is a guess to the solution of the evaluation function being optimized. Initially a population of individuals is randomly generated and doped with a reasonable guess to the solution to the evaluation function. At each generation new guesses are generated using crossover and mutation functions. The new guesses are evaluated and then replace the lowest scoring individuals in the population. An individual in this case is the vector comprised of the translation and rotation values, and the coefficients of the first N principal components. This process is then repeated until a termination condition is meet or until a maximum number of generations is reached. The GA used is a Matlab implementation written by myself and is based on the 19 GAOT toolbox written by Houck et al [12]. Blend, arithmetic and uniform crossover as well as single/multi-gene gaussian mutation operators were used to create the new guesses for each generation. The individuals used in the operators were selected using a roulette selection function. The elitist model, where the best individual of the current population always survives to the next generation was also used. 5.2 Digitally Reconstructed Radiographs Digitally reconstructed radiographs (DRR) are synthetic X-ray images and were used in this work instead of real X-rays, because the ground truth, if not explicitly known can be found fairly easily. The DRRs are created by casting rays through image slices of a CT scan (Figure 5.1). The CT scan data is stored in a series of DICOM (digital imaging and communications in medicine) files which contain information such as: the slice image, slice thickness, slice location and the conversion of pixel dimensions to SI units. Using the above information from the DICOM images a continuous volume can be approximated using 3D interpolation. Figure 5.1. Casting rays through a CT volume Casting rays through the CT volume requires defining a virtual camera from 20 which the rays will be cast. The virtual camera itself is based on the pinhole camera model with a focal length of 1199 mm, field of view of .1467 radians (8.4 degrees) and image plane dimensions of 640 by 480 pixels1 . The directions that result in the rays that pass through each pixel in the image plane can be calculated with respect to the camera origin. These directions are then mapped using a transformation matrix from the default camera view to the desired camera view. The rays themselves are subdivided into a series of finite measurement locations along each ray. Each measurement location has an XYZ component that is used in the 3D interpolation with the CT volume to determine its CT number (intensity) at each location. The CT number is related to the X-ray linear attenuation coefficient and is used in equation 5.1 to simulate the X-ray image at each u, v pixel location; Ã DRR(u, v) = Io − Io exp − X ! µj (u, v)t (5.1) j where Io is the maximum image intensity, µj (u, v) is the matrix of estimated linear attenuation coefficients at each jth measurement and t is the measurement length. 5.3 Virtual Reality Toolbox and Camera Calibration The virtual reality toolbox (VRT) is an add on package for Matlab that interacts with virtual environments written in virtual reality modeling language (VRML) and was used to render the predicted images of the 3D bone models, which are needed for the evaluation function. To render the images at the specific scale and camera views used in the generation of the DRRs, a viewing window is defined which can be orientated to each camera position, however the VRT does not allow the user to 1 The values used for the virtual camera match those used to model the fluoroscope in the original work. 21 explicitly define the field of view or focal length. The toolbox does have a parameter called the ”Zoomfactor” that effectively adjusts the field of view, but how it is adjusted or what is actually being adjusted is not documented in Matlab. Therefore the viewing window needs to be calibrated using the ”Zoomfactor” so that the relative scale of the predicted and DRR images are the same. To calibrate the VRT viewing window to the DRR camera a similar object needs to be rendered in both system. The object chosen was a 3D bar and was first generated by emulating CT DICOMS containing the bar slices. Using the pixel spacing and the distance between slices an equivalent VRML model was then produced for the VRT. Once both system were able to render the small known object, the ”Zoomfactor” was iteratively adjusted until the images produced from both systems were approximately the same. The value of 2.7879 was used for the ”Zoomfactor”2 . 2 This is a unique value to Matlab R2006b and may not be the same in different versions. 22 Chapter 6 EVALUATION AND TESTING The evaluation and testing of my improvements to the Mahfouz method was performed in three parts. First the improvements to the evaluation function were tested to verify whether or not there was any improvement over the original method. Second the improvements to the search method were compared to the modified Mahfouz method for the rigid optimization using the known bone model. The third and final part evaluates the full algorithm as defined in Chapter 4.2 for both the right femur and L5 lumbar. 6.1 Improvements to Evaluation Function To evaluate the changes made to the evaluation function a search was performed with the GA over the rigid parameters using the known shape with approximately 12500 function evaluations. This was repeated 10 times for two different data sets for a total of 20 trials. Each trial used a random starting pose each with a magnitude error of 1.5 cm in translation and 8 degrees in rotation. The 20 trial search was applied to four different cases for the purpose of comparison. Case 1 is the original Mahfouz method, case 2 applies the correction for blurring, case 3 uses the blurring correction and edge segmentation and case 4 uses all three improvements, namely the correction for blurring, edge segmentation and the spreading of the X-ray edge instead of the predicted edge. Once the all the trials were complete the average magnitude errors for each case were computed as well as 23 the corresponding 90% confidence interval assuming a t-distribution. The results for all 4 cases for the femur are shown graphically in Figure 6.1, where the values are displayed in Table 6.1. The overall results show that the errors of all four case overlap when their confidence intervals are considered and therefore they do not adequately show wither or not the modifications improved the evaluation function. Figure 6.1. The plot of mean errors and the corresponding 90% confidence intervals (error bars) for the four test cases used for the femur rigid optimization. Case 1 Case 2 Case 3 Case 4 |T | (mm) 0.865 ± 0.257 0.763 ± 0.227 0.866 ± 0.257 0.782 ± 0.232 |R| (deg) 0.697 ± 0.207 0.595 ± 0.177 0.979 ± 0.256 0.614 ± 0.182 Table 6.1. Mean errors and 90% confidence intervals for the four test cases used for the femur rigid optimization. 24 Figure 6.2. The plot of mean errors and the corresponding 90% confidence intervals (error bars) for the four test cases used for the lumbar rigid optimization. The results for all 4 cases for the lumbar are shown graphically in Figure 6.2, where the values are displayed in Table 6.2. Although the modifications when applied to the lumbar are different than the original method (case 1), they are not vary Case 1 Case 2 Case 3 Case 4 |T | (mm) 5.627 ± 1.671 0.294 ± 0.087 0.649 ± 0.193 0.552 ± 0.164 |R| (deg) 3.522 ± 1.046 0.800 ± 0.238 0.831 ± 0.247 0.973 ± 0.289 Table 6.2. Mean errors and 90% confidence intervals for the four test cases used for the L5 lumbar rigid optimization. different for each other. Comparing the results of the femur and lumbar together the lowest errors in both translation and rotation occurred in case 2 where only the correction for blurring was applied to the evaluation function. 25 6.2 Improvements to Search Method To test the improvements to the search method, two cases were considered where the rigid parameters where optimized using the GA for a known shape. The first case uses the Mahfouz method with the improvements from the previous section implemented. The second case is similar but uses XC in the evaluation function and calculates translation using the ray calculation every ten generations. For each case 10 trials were run with a random starting pose each with magnitude errors of 1.5 cm and 8 degrees in translation and rotation respectively. Each trial was run for 150 generations with approximately 35 function evaluations per generation. During the trials the magnitudes errors in translation and rotation were computed Figure 6.3. Comparison of the average magnitude errors of the femur with the standard search method in blue and the cross-correlation search method in green. per generation of the search. These errors were then averaged across the trials at 26 each corresponding generation and are displayed graphically in Figures 6.3 and 6.4. Figure 6.4. Comparison of the average magnitude errors of the lumbar with the standard search method in blue and the cross-correlation search method in green. For both the femur and lumbar the magnitude errors achieved by the XC search method in approximately 10 generations are smaller than the errors achieved by the original search method at generation 150. In terms of time the XC search method also arrived at better answer faster than the original search method. For the femur the error plots for the original method still has a negative slope so it is possible that given enough generations it would have achieved the same errors as seen in the XC method. For the lumbar the original method appears to have a slopes close to zero and it is likely that it is getting consistently stuck in a solution regardless of its starting position. Also the disparity in the errors for the lumbar are much larger than they are for the femur. 27 6.3 Experimental Results of the Full Algorithm The full algorithm, with the improvements to the evaluation function and the two-stage search search method, was evaluated on DRR images of the femur and lumbar vertebra using the deformable atlas models. The magnitude errors for the starting translation and rotation used were 1.5 cm and 8 degrees. The search space was bounded to ±8.8 degrees for each rotation angle and weights of ±0.75 on the principal components. The two atlas models used consisted of 188 femurs and 14 L5 lumbar vertebrae, where the data sets evaluated where not included in the construction of the atlases. Since the ground truth can only be estimated for rotation and translation when using the atlas models, the results were evaluated by computing the point to surface errors from the optimized atlas models to the known models. This was done by performing iterative closest point [13] between the optimized atlas models and known models. 6.3.1 Femur Results For the femur, the pose and the first 9 principal components of the atlas were estimated, as described in Chapter 4.2, where the GA is used to optimize rotation and shape, and the translation is calculated. Table 6.3 shows the point to surface errors for the optimized atlas model for the 27 data sets evaluated and Table 6.4 shows the analysis of those errors. Figure 6.5 and 6.6 show typical starting and ending poses with the predicted silhouettes overlayed on the DRRs. Although the mean RMS error is 0.387 cm, the mean of the maximum error is 1.500 cm. This suggests that most of the vertices in the optimized atlas model were fairly close to the known model with the exception of some outliers, which typically show up at the extremities of the bone as can be seen in Figure 6.6. 28 Max Error RMS Error Max Error RMS Error (cm) (cm) (cm) (cm) 1 1.198 0.279 15 1.295 0.393 2 2.106 0.525 16 1.202 0.279 3 1.666 0.426 17 0.516 0.202 4 1.358 0.361 18 0.544 0.233 5 2.062 0.460 19 0.994 0.294 6 1.201 0.322 20 2.512 0.542 7 0.911 0.318 21 2.535 0.521 8 2.586 0.561 22 1.290 0.324 9 1.459 0.442 23 2.081 0.494 10 2.170 0.526 24 1.030 0.384 11 2.227 0.602 25 0.739 0.192 12 1.718 0.457 26 2.181 0.495 13 0.848 0.287 27 1.340 0.313 14 0.683 0.206 Table 6.3. Optimized femur shape point to surface errors. Min Max Mean Std Max Error (cm) 0.516 2.586 1.500 0.641 RMS Error (cm) 0.192 0.602 0.387 0.122 Table 6.4. Analysis of optimized femur shape point to surface errors 29 Figure 6.5. Typical starting pose for the right femur. 30 Figure 6.6. Typical optimized shape for the right femur at the ICP registered rigid pose. 31 6.3.2 L5 Lumbar Vertebra Results For the lumbar vertebra, 13 principal components and the pose were estimated, as described in Chapter 4.2, where the GA is used to optimize rotation and shape, and the translation is calculated. Table 6.5 below shows the point to surface errors for the optimized atlas model for the 2 data sets evaluated. Figure 6.7 and 6.8 show typical starting and ending poses with the predicted silhouettes overlayed on the DRRs. Max Error RMS Error (cm) (cm) 1 0.544 0.160 2 0.482 0.157 Mean 0.513 0.158 Table 6.5. Optimized lumbar shape point to surface errors. The mean RMS error is 0.158 cm and the mean maximum error is 0.513 cm. As with the femur, this suggests that most of the vertices in the optimized atlas model were fairly close to the known model with the exception of some outliers (see Figure 6.6). The ratio of the mean maximum error to the mean RMS error for the lumbar is 3.269 to 1, which is better than the femur at 3.876 to 1. This difference may be do the fact the lumbar DRRS are rendered at a third of the distance from the virtual camera than the femur DRRS, therefore the pixel resolution is effectively three times larger. 32 Figure 6.7. Typical starting pose for the L5 lumbar vertebra. 33 Figure 6.8. Typical optimized shape for the L5 lumbar vertebra at the ICP registered rigid pose. 34 Chapter 7 SUMMARY 7.1 Major Results and Conclusions The evaluation function has been improved by correcting for the misalignment of edges due to blurring, segmenting the edges, and spreading the input X-ray segmented edge images instead of the predicted edges images. The bi-planar search method was also improved by incorporating cross correlation into the evaluation function and calculating translation instead of optimizing it. The most significant improvement to the original evaluation function is the correction of edge misalignments do to blurring. Although the simple cylinder approximation used to calculate the edge shift was reasonable for the femur, it was not for the lumbar vertebra due to its irregular shape. In spite of this the results of the cylinder approximation did improve the results for the lumbar vertebra as well. The XC search method is another significant improvement over the original search method and was shown for the rigid case to out perform the original search method. As for the evaluation of the full algorithm the modeling errors are much to large for the optimized models to be realistically used in computer aid surgery, where the ideal error in terms of orthopedic surgery is less than 2 mm. This work was limited to DRRs, where near perfect camera calibration is possible and realistic effects such as varying source intensity and X-ray hardening do not occur. To be applied to real X-ray images a robust calibration process would need to be used 35 to account for these effects. Another limitation is the overall optimization speed or time required to optimize the atlas models, which is not currently fast enough for intraoperative procedures, but is fast enough to be used preoperatively to create patient specific bone models. Another limitation is that the atlas model used for the lumbar consisted of a small number of models and only 2 data sets to test against. It is likely that with a larger atlas and more data sets that the results would be much different. Also the variations of the atlas are ultimately defined by the statistics of the contained models, an unusual bone, possibly deformed or containing un-modeled statistics would not likely have a very accurate atlas model description. To minimize this possibility many more models would need to be incorporated into the atlas. The disadvantage to using a great number of models in the atlas is that the total number of available PCs is directly proportional to number of models contained in the atlas. This is a problem because the first N PCs needed to represent 99% of the model variation would also scale with the total number of available PCs, therefore using too many models in the atlas could result in an N dimensional space to large to optimize. To maximize the statistical variation of the atlas and minimize the number of PCs needed to represent the models, more than once atlas could be constructed for the same bone. Each atlas could be based on a different patient demographic, such as age, ethnicity or patient sex. 7.2 Recommendations for Future Work Using only two camera views doesn’t allow for direct observation of Rz , the common non-planar rotation angle between the two views. Using a third view that is not approximately orthogonal to this non-planar rotation angle may further reduce 36 the errors in rotation. Using different views such as 45 degrees from the knee cap or spine may give also improve the errors in shape by providing more edges that are not observable in the orthogonal views. If more information, such as the internal edges are rendered in the predicted images and considered by the evaluation function the point to surface errors of the optimized atlas models may be further reduced. As a crude approximation to an Xray image the predicted images could be generated by rendering the model partially transparent (Figure 7.1) and the corresponding edge image would then have some internal edges. In VRML, this can be done quickly by simply changing the surface properties of the model prior to rendering. The reason this would be a crude approx- Figure 7.1. Lumbar vertebra rendered partially transparent. imation is that the current model is represented only as a surface (no interior) and is rendered using self illumination, therefore the internal edges revealed by rendering the model partially transparent would not necessary correspond to the internal edges in the X-ray images. A better approximation to an X-ray image would be to render the surface model as a DRR with a uniform density. In general bones have a non-uniform density so the approximation could be further improved by developing a bone atlas that considered 37 not only the surface of the bone but its internal density. Generating DRRs from this volumetric bone atlas would then give the best overall approximation to the X-ray images. Also rendering the predicted images in this manner would give the option to compare the them directly to the X-ray images. The feasibly of using DRRs as predicted images does however depend on how fast they can be rendered. In the existing Matlab code it takes about 3.5 minutes to generate each 640 by 480 DRR with 256 measurements along each projected ray on a computer using a Pentium 4 3.0Ghz Dual Core processor. While optimizing the existing code or using a different method altogether may speed up the process, ultimately the I think the ideal route would be render the DRRs in hardware. 38 REFERENCES [1] R. H. Taylor, S. Lavallee, G. C. Burdea, and R. Mosges, Eds., ComputerInegrated Surgery. Cambridge, Massachusetts and London, England: The MIT Press, 1996. [2] M. Kuhn, M. Mahfouz, E. ElHak, and B. Merkl, “Reconstruction of 3d patientspecific bone models from biplanar x-ray images,” in 12th International Conference on Biomedical Engineering, Singapore, Dec. 2005. [3] M. R. Mahfouz, W. A. Hoff, R. D. Komistek, and D. A. Dennis, “A robust method for registration of three-dimensional knee implant models to two-dimensional fluoroscopy images,” IEEE Transactions on Medical Imaging, vol. 22, no. 12, pp. 1561–1574, Dec. 2003. [4] G. P. Penney, J. Weese, J. A. Little, P. Desmedt, D. L. G. Hill, and D. J. Hawkes, “A comparison of similarity measures for use in 2-d - 3-d medical image registration,” IEEE Transactions on Medical Imaging, vol. 17, no. 7, pp. 586–895, Aug. 1998. [5] S. Benameura, M. Mignottea, S. Parentd, H. Labelled, W. Skallie, and J. de Guisea, “3d/2d registration and segmentation of scoliotic vertebrae using statistical models,” Computerized Medical Imaging and Graphics, vol. 27, pp. 321–337, 2003. [6] Y. Zheng, M. S. Nixon, and R. Allen, “Automated segmentation of lumbar vertebrae in digital videofluoroscopic images,” IEEE Transactions on Medical Imaging, vol. 23, no. 1, pp. 45–52, Jan. 2004. [7] B. Merkl and M. Mahfouz, “Unsupervised three-dimensional segmentation of medical images using an anatomical bone atlas,” in 12th International Conference on Biomedical Engineering, Singapore, Dec. 2005. [8] I. T. Jolliffe, Principal Component Analysis. Berlin, Germany: Springer-Verlag, 1986. [9] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 2nd ed. Saddle River, New Jersery: Prentice Hall, 2001. Upper [10] J. Canny, “A computational approach to edge detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 8, no. 6, pp. 679–698, Nov. 1986. 39 [11] J. P. Lewis. Fast normalized cross-correlation. Industrial Light & Magic. [Online]. Available: http://www.idiom.com/∼zilla/Papers/nvisionInterface/nip.html [12] C. Houck, J. Joines, and M. Kay, “A genetic algorithm for function optimization: A matlab implementation,” North Carolina State University, Raleigh, NC, Tech. Rep. NCSU-IE-TR-95-09, 1995. [Online]. Available: http://www.ise.ncsu.edu/mirage/GAToolBox/gaot/ [13] P. Besl and N. McKay, “A method for registration of 3-d shapes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 2, pp. 239–256, Feb. 1992.