Investigating the Role of Simulation Fidelity in Laparascopic Surgical Training by Hyun K Kim B. S., Mechanical Engineering Massachusetts Institute of Technology, 2000 Submitted to the Department of Mechanical Engineering in Partial Fulfillment of the Requirements for the Degree of Master of Science in Mechanical Engineering at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY August 2002 0 2002 Massachusetts Institute of Technology All Rights reserved Author Department of Mechanical Engineering August 9, 2002 Certified by ....................................................................................................... Dr. Mandayam A. Srinivasan Mechanical Engineering Scientist, Research Senior Thesis Supervisor Accepted by ....................................................................................................... Prof. Lallit Anand Professor of Mechanical Engineering Chair, Department Committee for Graduate Students Investigating the Role of Simulation Fidelity in Laparascopic Surgical Training by Hyun K Kim Submitted to the Department of Mechanical Engineering on August 9, 2002 in Partial Fulfillment of the Requirements for the Degree of Master of Science in Mechanical Engineering ABSTRACT Minimally invasive surgery (MIS), with its aptitude for quick recovery and minimal scarring, has revolutionized surgery over the past few years. As a result, the development of a VR-based surgical trainer for MIS has been a popular area of research. However, there still remains a fundamental question of how realistic the simulation has to be for effective training. On the one hand, learning surgical practices with an unrealistic model may lead to negative training transfer. However, because of the learning abilities and perceptual limitations of the sensory, motor, and cognitive system of the human user, perfect simulation is unnecessary. Furthermore, given the large variations in human anatomy and physiology, there is no single perfect model. The question is how simple a simulation can we get away with, while at the same time preserving a level of fidelity between the virtual and real organ behavior that leads to positive training transfer. A dual station experimental platform was set up for this study. The two stations consisted of a real environment testing station and a virtual environment training station. The fidelity of the simulation could easily be adjusted in the virtual training station so that subjects could be treated with different modes of training. With the dual station setup the real environment performance of a subject before and after VE training could be measured. First round of experiments on the setup were conducted to investigate the effect of haptic fidelity and the effect of part task training on surgical training. Haptic fidelity was adjusted by modeling a material of non-linear stiffness to different degrees of accuracy. Subjects were initially tested on the real station performing a bimanual pushing and cutting task. They were then trained on the virtual station, with one of the three different levels of haptic fidelity or the part task trainer. Once the training was complete, the subjects were again evaluated on the real environment station to gauge their improvement in skill level. Initial results showed a marked difference in level of skill improvement between training with haptics and without. However there was no significance difference in the training effectiveness of the higher fidelity and lower fidelity model of elasticity. Also part task training proved to be an equally effective method of training for the surgical task chosen. Experiments with modeling the non-linearity materials are one of many studies that can be done on this platform, including adjusting other modes of haptic fidelity such as viscoelasticity and experiments with graphic fidelity. Results from such experiments can serve as the basis of future surgical simulation development by providing guidelines on environment fidelity required for positive training transfer to occur. Table of Contents TABLE OF CONTENTS .............................................................................................. 1 LIST OF FIGURES AND TABLES ............................................................................ 4 1 INTROD UCTION ................................................................................................ SUMMARY OF PREVIOUS WORK........................................................................ M O TIVA TION ........................................................................................................ PA RA DIG M ......................................................................................................... ANALOGIES WITH FLIGHT SIMULATOR DEVELOPMENT................................... 1.1 1.2 1.3 1.4 1.5 2 DESIGN OF THE EXPERIMENT PLATFORM ........................................................ 2.1.1 2.1.2 2.1.3 2.1.4 2.2 2.3 Factorsthat Effect Virtual Environment Training................................. Haptic Fidelity ...................................................................................... Part Task vs. Whole Task Training........................................................ EXPERIM ENTAL PROCEDURE.............................................................................. EXPERIMENTAL RESULTS....................................... 3.1 3.2 3.3 RESULTS FROM TESTING STATION .................................................................. TOTAL SCORE AND SKILL IMPROVEMENT ........................................................ LEARNING CURVE FOR TRAINING .................................................................... 7 8 10 11 13 13 17 21 22 24 24 25 27 28 30 30 32 37 41 ANALYSIS AND DISCUSSIONS .................................... TRAINING EFFECTIVENESS ............................................................................... STATISTICAL A NALYSIS .................................................................................. LEA RN ING C U RV E.............................................................................................. RESULTS FROM FURTHER SUBJECTS ................................................................ 41 45 48 50 CONCLUSIONS AND FUTURE WORK.............................. 53 4.1 4.2 4.3 4.4 5 Overview ............................................................................................... Design of Task....................................................................................... Implem entation of Tasks ....................................................................... PerformanceMeasurements .................................................................. DESIGN OF VALIDATING EXPERIMENTS ........................................................... 2.2.1 2.2.2 2.2.3 4 6 DESIGN OF PLATFORM AND EXPERIMENTAL METHODS......... 13 2.1 3 6 INTRODUCTION ................................................................................................. 5 .1 5.2 C ON C LU SIO N S .................................................................................................... FU TU RE W O RK ................................................................................................ REFERENCES ......................... 0.... .. 00.. ............ o........................ 53 54 ...... 56 List of Figures and Tables Figure 1.1: Experimental setup with the real station and virtual station.................. 10 Experiment setup showing (a) the real environment test station and (b) the Figure 2.1: 14 virtual environm ent training station...................................................................... Two phantoms (a 1.5A model and a 1.OA model) connected to the end of Figure 2.2: 15 the surgical tools in the virtual station. ................................................................ Figure 2.3: System for real-time simulation with both graphic and haptic feedback.. 16 Hybrid system which produces "virtual" forces from the phantom and Figure 2.4: "real" forces from the abdominal wall................................................................. 17 Sequence of operation for the tasks chosen in the experiment platform. Figure 2.5: The figure shows the tasks done in the virtual environment for ease of presentation. The tasks and objects in the real environment station are the same. .................... 19 (a) The operation scene in the real environment test station. The blocks Figure 2.6: are mounted on linear sliding bearings, spring loaded and connected to LVDTs. ...21 22 Figure 2.7: Surgical tool with pen tip installed to the end....................................... Figure 2.8: Factors that affect general virtual environment training....................... 24 Figure 2.9: Force displacement plot for rubber spring material.............................. 26 Figure 2.10: (a) Linear approximation to force displacement curve, (b) Nonlinear (exponential) approximation to force displacement curve. The order of magnitude of the stiffness is in agreement with what has been measured experimentally for a 26 pig's esophagus (0.005N/mm - 0.02N/mm)........................................................ Figure 3.1: Sample of raw data from the real environment test station for subject P3, before and after training. The data shows, (a) profile of block position, (b) incisions 31 marks, (c) time and (d) number of obstacles hit. .................................................. Bar chart of the total performance score before after training using the Figure 3.2: 35 nonlinear elastic model. ........................................................................................ Bar chart of the total score before and after training for the (a) linear Figure 3.3: haptic trainer, (b) no haptic trainer, (c) part task trainer........................................ 36 Learning curves for subjects who received training on the nonlinear elastic Figure 3.4: . . 38 model VR trainer................................................................................................. Learning curves for subjects who received training on the linear elastic Figure 3.5: . . 39 model VR trainer................................................................................................. Figure 3.6: Learning curves for subjects who received training on the VR trainer . . 40 w ithout haptics ................................................................................................... Figure 4.1: Average training effectiveness plot ...................................................... 42 Figure 4.2: Performance improvement shown with initial skill level group ........... 44 Average learning curve for training on the VR trainer. Error bars show Figure 4.3: 49 standard deviation among subjects. ..................................................................... Figure 4.4: Table 1: Average training effectiveness plot (Revised)..................................... Perform ance metrics .................................................................................. Experimental design table showing how the subjects were divided and Table 2: trained. N1, N2, etc. represents the subjects........................................................ 51 23 28 Sample scoring table for evaluation test conducted on the real environment Table 3: test station before and after training using the non-linear elastic model. (For 5 34 subjects.) ................................................................................................................... Improvement in performance score (training effect) after virtual reality Table 4: train ing . ..................................................................................................................... 37 ANNOVA table calculations for two factor factorial test with n replicates of Table 5: each treatment combination [44]. n=l, a=4 and b=5 for this experiment............. 47 Table 6: ANOVA table of results........................................................................... 47 1 Introduction 1.1 Introduction The potential of developing a virtual reality based surgical simulation has captivated the imagination of the scientists and engineers of multiple disciplines in recent years. Although simulations of various surgical procedures have been attempted, the majority of research has concentrated on simulating minimally invasive surgery (MIS). Many surgeons agree MIS has revolutionized surgery over the past decade. Over 2.5 million surgeries were conducted through minimally invasive procedures in the US in the past year alone. Minimally invasive surgery is performed through small incisions made on the outer layer (usually the abdominal wall or skull) of the operating region. An endoscope and surgical tools are inserted through these incisions. The operation is done using long slender surgical tools that are pivoted by a trocar at the outer layer. The endoscope provides the visual feedback of the scene of operation on to a CRT screen. Among the many advantages of MIS, the majority of them can be attributed to the fact that very small incisions are made. This results in much less pain to the patient than in traditional surgery. Also, the recovery time is shortened considerably, allowing the patient to leave the hospital in a matter of days after the operation. Not only is this an obvious advantage to the patient, but it also benefits the hospital by shortening hospital stay, reducing the load on sick beds and nurses. However, there are difficulties in performing these procedures on the surgeon's part. Most endoscopes only provide a 2-D view that making visual depth perception difficult. Also, the surgeons can only view the operation scene on a remote CRT screen. This results in difficulties in hand-eye coordination. There is also a problem of minimal tactile feedback from tool-tissue interaction, mainly because the surgeon is feeling the organs through the end of a long tool and the forces at the trocar/abdominal wall interface dominate. Not only is it is hard to detect small forces on the tool tip from the soft organ tissue, it is hard to manipulate the long surgical tools compared to the direct use of hands. Finally, the high frequency tremors of the hand are amplified at the end of long tool, making precise tool control even more difficult. Due to such difficulties, training for MIS is a long an arduous process. The traditional method of training has been apprenticeship, where the surgical resident learns by observing and assisting an 6 expert surgeon until he or she is ready to perform surgical procedures on their own. However, there are obvious disadvantages to this type of training. First of all, there is risk to the patient as surgeons who have not completed their training hone their skills in the actual operating room. Also, the accessibility of training can be a problem for the residents. The training can only be done if a patient is available and not when the trainee requires the training. Also, the feedback to the trainee is qualitative and the quality of the feedback could vary widely depending on the surgeon overseeing the training. Such shortcomings of the traditional method have led to the development of virtual reality based surgical simulators. If MIS procedures could be simulated using a PC and simple visual and haptic interfaces, there would certainly be numerous advantages over traditional methods. First of all, a simulator would provide a environment for training without risk of injury to the patient. Also, the training would always be available as often as the trainee desires and the feedback from the training sessions can be immediate and quantitative. Customized software can be used to simulate various and even rare surgical procedures. Also, the simulation would allow the surgical environment to be controlled to cater for the specific conditions that are desired. Therefore an effective virtual reality based trainer would be an ideal platform for training both novice and expert surgeons, and thus a great deal of research has been done in this area over the past 10 years. 1.2 Summary of Previous Work Extensive effort has been spent in developing surgical simulators to aid users study anatomy and practice medical procedures. [1-3]. Satava [4] developed one of the first surgical simulators, which included a model of the abdominal anatomy that could be viewed in 3-D using a headmounted display. The early simulators such as Satava's and others [5] provided visual feedback only. However as more effective haptic devices became available, force feedback was incorporated into the surgical simulations. Initial models of the human organ were based on simple lumped parameter models [6, 7]. However, as computing speeds increased, physically based finite element models [8-10], and other meshless methods [11] have been used as a way of modeling the human organ. Also, part task training simulators have been developed as a tool for training perceptual motor and spatial skills, without constructing complicated models of organs [12-15]. In fact a part task trainer name MIST VR is currently commercially available on the market and is used sparingly in teaching hospitals. 7 Another area of research that has relevance to this study is the measurement of transfer of training from the virtual environment. Virtual reality is being used in many disciplines as a method of training. Consequently, there has been a vast number studies investigating training in the virtual environment. Adams et al. [16] showed that virtual environment training improved performance in manual assembly tasks involving lego blocks. However, force feedback did not appear to improve the training significantly. Earlier, Kozak et al. [17] showed that, low fidelity simulations could lead to zero or negative training transfer to the real environment. Also, similar VR training transfer studies have been done with spatial navigation training [18, 19], post-stroke rehabilitation [20], and flight simulations [21-24]. These studies showed that virtual reality training could have mixed results depending on the fidelity of the simulation and the task being training for. On a related topic, Wagner, Howe and Stylopoulos [25]conducted experiments to see the effect of levels force feedback on teleoperation performance. Operation with Force feedback resulted in better performance compared with operation without. However, the difference between augmented (x 1.5) force feedback and smaller (x 0.75) feedback was not significant. 1.3 Motivation Previous research on surgical simulations has shown one shared conclusion. It is difficult to simulate the surgical environment accurately. First of all, it is almost impossible to model tooltissue interaction exactly. Human organs are nonlinear, anisotropic, visco-elastic, non- homogeneous and their boundary conditions are not well known. On top of all this, there are physiological effects that are difficult to model, such as breathing and blood flow. Also, to model organs accurately in-vivo material properties of human organs need to be known and these are not easy to measure. Furthermore, even if an accurate model can be established, real time simulations require very fast computation. As the model becomes more and more complicated, a longer computation time would be required to render the object. Therefore, modeling the interaction between the organ and the surgical tool is not trivial. It can be safe to say that an exact real-time simulation of tool-tissue interaction cannot be the accomplished in the near future. However, it turns out that an exact simulation is not required. Due to the learning abilities and perceptual limitations of the sensory, motor, and cognitive system of the human user, perfect simulation is not necessary. Furthermore, given the large variations in human anatomy and 8 physiology, there is no single perfect model, and wide variations exist in geometry and material properties of organs. In fact, it has been seen in many other domains where virtual reality has been used as a training method, a low fidelity simulation can give a positive training transfer. The main question is how simple a simulation can we get away with for surgical simulations, while at the same time preserving the level of fidelity between the virtual and real organ behavior that leads to positive training transfer. A study of measuring training transfer under various levels of fidelity has not been conducted up to this point. The difficulty lies in measuring the improvement of real world surgical skills from virtual environment training. Measuring surgical skills within the virtual environment can be done easily. However, some of these skills might only pertain to the virtual environment alone and the training may have no significant or even negative effects on real world skills. Therefore, what is truly required is the measurement of improvement in real world skills. The ideal way of measuring this would be to have simulators of various fidelity and to train novice surgeons on these simulators alone and see how the subjects' surgical skills differ during real surgery. However, the surgical trainers that are available currently are not mature enough for such a study to take place without risk to the patient. A current method that is widely used to validate low fidelity simulators is to measure the inverse transfer of training [26]. This involves comparing the performance of an expert surgeon with a novice surgeon on the simulator, and if the expert surgeon performs significantly better the simulator is deemed to be effective [27, 28]. However, such inverse arguments are not sufficient enough to conclude that the simulator gives positive training transfer to the real environment. Therefore, there is a need for an alternate method for measuring training effectiveness directly. As it was indicated previously, a low fidelity surgical simulation could be effective in training a surgeon. However, there remains the question, how accurate does the haptics have to be for the simulation to be effective? How accurate does the graphics have to be? There are also questions that do not concern fidelity. Such as, what is the best method (part-task vs. whole task) for training? What is the most effective method of feedback? Questions such as these are fundamental issues that need to be addressed even before any further development of surgical simulators can be done. To answer these questions, a method of measuring training effectiveness under various training conditions is required. However, there is a missing link between the virtual and real environment that precludes the measurement of training effectiveness for laparoscopic surgical simulators. The research done for this thesis proposes to provide this missing link. 9 1.4 Paradigm A two-station experiment platform was setup as the test bed. The two stations are laparoscopic stations with analogous surgical tasks. However, in one station the surgical tasks are performed on real world objects, whereas the other station contains virtual objects. The real world station contains simple inanimate objects, for which the shape and material properties are well known. This allows the real objects to be very accurately modeled in the virtual environment. Therefore a very high fidelity simulation of the real environment is possible. Thus, this provides an opportunity to vary the fidelity of the simulation from very high fidelity to low fidelity. Figure 1.1: Experimental setup with the real station and virtual station With this setup, it is now possible to measure training transfer from the virtual training environment to the real environment. The real world performance of a subject before and after VE training can be now measured, since the same surgical tasks are performed in both stations. Therefore, various factors can be varied within the virtual environment to see the effect on real world skills. These factors can be haptic fidelity, graphic fidelity and type of training, among 10 others. Therefore, this experiment platform provides the missing link between the virtual and real environment that is key to answering the fundamental questions concerning training effectiveness. Analogies with Flight Simulator Development 1.5 Although this is one of the first studies done on measuring training transfer for VR surgical simulators, there has been a great deal of research done on measuring training transfer in other domains of VR training, as previously mentioned. Flight simulators are a good example of a case where there has been extensive research on training effects of virtual environments [29-31]. The requirements for a faithful simulation are the same for both surgical and flight simulators. (1) A complete model, expressed mathematically, of the response to all inputs from the operator (2) A means of solving these equations in real-time (3) A means of presenting the output of this solution to the operator by means haptic, visual and aural responses Due to the long history of flight simulations, they have matured to a stage where they are commonly and effectively used today for training novice aviators. However, when early flight simulators were being developed in the first half of the 2 0 th century, engineers were faced with similar questions that we are faced with today for surgical simulations. The physics of aviation was not known well enough to have an accurate model, and the sensor, actuator and computing technology was not developed enough to provide satisfactory real-time responses. Therefore, the early simulators were very low fidelity models of flight [32]. The central issue of whether or not these low fidelity simulations can achieve positive training transfer was commonplace back then as it is now for surgical simulations. The analogy does not end there. The obstacles to measuring training transfer were similar. The ideal way of measuring training transfer would have been to train a novice pilot solely on the simulator and see how well they perform in real flight situations. However, such a study was not realistic due to the risk of accidents and the availability and cost of aircrafts. The only method of validating the simulators was by inverse transfer of training, where expert pilots attested to the similarity of the simulator to real flight [32, 33]. However, such methods are never enough to show the effectiveness of a simulation as a training tool. Consequently, the early simulators did not have much of an impact on flight training because most aviators were not convinced of their usefulness. This is similar to what has been observed 11 currently so far, where surgeons are not completely won over by the effectiveness in training with the low fidelity simulations. The initial uncertainty about the training effectiveness of flight simulators was answered over time. By WWII the mechanics of flight was much better understood producing more accurate mathematical models of flight. There was also significant improvement in simulator technology from the initial mechanical devices to more complex electromechanical systems with analog and eventually digital computation. Such development ultimately shaped a very accurate simulation of flight, to a degree to which there is no doubt today that a flight simulation is an effective tool for training. In fact, present day flight simulations are at the core of flight training, with strict FAA regulations governing the fidelity requirements [34, 35]. With the development of surgical simulations still in its infancy, there are several lessons that can be learnt from the history of flight simulations. For instance, due to fact that there was no easy way to measure training effectiveness, a lot of money and man-hours were spent to perfect the flight simulation using state of the art of technology. However, in many cases the improvements did not enhance the training effectiveness due to limits of human learning ability and sensorimotor skills. Although a more accurate simulation may have been achieved, these were cases where resources were wasted. Such waste could have been avoided if training effectiveness studies were done to determine the level of fidelity required for the desired degree of training. This is one of the lessons learnt that is central to this study and that is why measuring training transfer at an early stage is important for funneling resources in the right direction for surgical simulators. There are also encouraging signs that history provides. The many similarities in the issues concerning the development gives hope that one day surgical simulators will play a central role in training surgeons as flight simulators do for flight training today. 12 2 Design of Platform and Experimental Methods The first part of this chapter describes the how the experiment platform was designed and built. Once the platform was set up, experiments were done to validate the usefulness of the platform. The latter part of this chapter describes how these experiments were designed and performed. 2.1 Design of the Experiment Platform Laparoscopic surgery is minimally invasive surgery performed on the abdominal region. Surgical simulation development at the Touch Lab mainly focuses on simulators for laparoscopic surgery. Therefore, the training transfer experiments for this study was also done in a laparoscopic setup. 2.1.1 Overview The experiment platform consists of two stations. The real environment laparoscopic setup acts as the testing station and the virtual environment setup acts as the training station. (a) Test Station 13 (b) Training Station Figure 2.1: Experiment setup showing (a) the real environment test station and (b) the virtual environment training station In the real environment station a rubber model of an abdominal wall (Limbs and Things, Inc.) covers the operation scene. Two laparoscopic tools are inserted through the abdominal wall via trocars, as done in real laparoscopic procedures. The objects that are to be operated on are instrumented and are placed inside the abdominal area. A laparoscope, attached to a camera, is also inserted through the abdominal wall to provide visual feedback on to a CRT screen, again as in real surgery. The station is setup so that the subject can perform laparoscopic tasks on the objects, while the data from the tasks are recorded. The virtual environment station includes the same rubber abdominal wall, with surgical tools and trocars inserted at the same position as in the real station. However, underneath the abdominal wall, the surgical tools are connected to Phantom (SensAble Technologies) haptic interface devices (Figure 2.2). The objects are generated in the virtual world by the computer and displayed to user graphically through the computer monitor and haptically through the Phantom devices. Both the real station and the virtual station contain foot petals to activate the harmonic scalpels that are on the end of the surgical tools. 14 Figure 2.2: Two phantoms (a 1.5A model and a 1.OA model) connected to the end of the surgical tools in the virtual station. For real-time simulation to be possible with the computer and phantom devices, the system shown in Figure 2.3 had to be set up. The Phantom devices were connected to a 550MHz Pentium III PC for this setup. 15 Force Feedback Device Read tip position " Force(l KHz Timer - Collision detection Display Interface rision response (Force,IDeform ation) Visual (301H1z) Visual Display Processor Figure 2.3: System for real-time simulation with both graphic and haptic feedback As the figure shows, the force feedback device sends back the tip position information to the CPU, the processor determines whether or not the tip is in collision with a virtual object and calculates the relevant deformation and force feedback for the object. This information is sent to the graphic and haptic interfaces to be displayed to the user. For the simulation to seem realistic in real-time, a refresh rate of 1 kHz was used for the haptics and 30 Hz was used for the graphics for this setup. The rubber abdominal walls were used in the two stations because it was determined that the horizontal resistance forces provided by the wall on to the tools were very important. Laparoscopic surgeons attested to the fact that forces by the abdominal wall at the trocar are the dominant forces during surgery, much greater than the smaller forces between the tools and the organs. It would be technically challenging to simulate both the abdominal wall forces and contact forces at the same time. Therefore, a rubber abdominal model was used so that the modeling only needs to be done for forces between the tools and the objects underneath the wall. This type of hybrid model for laparoscopic simulation was used for the first time in this setup. This hybrid system seems to be the most effective setup for laparoscopic simulation as of yet, 16 under the assumption that the purchased rubber abdominal model is a faithful model of the real thing. Figure 2.4: Hybrid system which produces "virtual" forces from the phantom and "real" forces from the abdominal wall. Finally, the proposed experiment procedure using the setup was as follows. A subject would initially be tested on the real environment test station, to measure their skill level prior to training. Once they are tested, they would be trained for several sessions on the virtual reality trainer. After the training is complete the subject would be brought back to the real environment to be evaluated for the final time. The measurement of training effectiveness would be the increase in skill level within the real environment. Therefore, using this setup in such experiments, various factors can be adjusted within the virtual environment to see the real world training effect. 2.1.2 Design of Task With the hardware in place, it was important to choose the appropriate tasks for the subjects to be trained and evaluated on. There are numerous procedures that are performed by laparoscopic techniques. The procedures are mostly a combination of a set of sub-tasks that are commonly performed. Choosing the right combination of sub-tasks was crucial for this study to be relevant and the results to be meaningful. The following criteria were used to select the final task. (1) The tasks had to be relevant to laparoscopic surgery. Although the operations were performed on inanimate objects and not human organs, the actions had to be very similar to what is done in real laparoscopic surgery. This was mainly verified by expert laparoscopic surgeons from MGH, who were collaborators on this project. 17 (2) The chosen task needed to provide a graded "mimicability" of the real tasks in the virtual environment. One of the main areas of interest for this training effectiveness study was to see how environment fidelity affects training. Therefore it was necessary to be able to adjust the fidelity within the virtual environment. Choosing tasks and objects that could be modeled to varying degrees of accuracy, both haptically and graphically, was important. (3) The tasks needed to have an appropriate level of difficulty so that a suitable number of training sessions could take place. If the task is too easy, the subjects would be able to perform the tasks without much training and it would be difficult to distinguish between the different training effects. On the other hand, if the tasks were too difficult, there would not be much improvement in subject skill level even after training. Therefore, again it would be difficult to see the effects of the different trainers. (4) Finally the tasks had to have a significant number of metrics so that the performance of the subject can be quantified during both training and evaluation. Besides these criteria, there were some other desirable traits. It would be ideal if the performance from the tasks chosen showed a small variance across all subjects. Also, it was hoped that haptic feedback plays an important role in performing the tasks chosen, since the role of haptic fidelity in training is one of the main interests of this study. In choosing a task that fits the above criteria, operations that are commonly done in real laparoscopic surgery were divided into the following building blocks of subtasks. 1) Positioning and orientating tool. 2) Obstacle avoidance. 3) Palpation. 4) Piercing. 5) Cutting (scissors or harmonic scalpels). 6) Pushing (moving organs out of the way). 7) Pulling. 8) Wrapping. A combination of these sub-tasks was chosen, but they could not all be combined in to one main task. Therefore some were included and some were not. Pushing was chosen because it would be possible to have a graded fidelity in modeling pushing operations. Positioning and obstacle avoidance were incorporated because these tasks are generic to all laparoscopic procedures. 18 Wrapping was eliminated because the task would be difficult to simulate accurately for high fidelity. Pulling was not chosen because it is similar to pushing, and piercing was eliminated because force feedback is minimal when performing the task. Cutting was chosen because it requires precise tool control and is a very important part of laparascopic surgery. In the end a bimanual pushing and cutting task, similar to what is done in Heller's myotomy was chosen. Heller's myotomy involves cutting muscle fibers in the esophagus to relieve stress which can cause difficulty in swallowing. The scene and task chosen for the experiment platform is shown below in section in Figure 2.5 (a) (b) (c) (d) (e) (1) (g) (h) Figure 2.5: Sequence of operation for the tasks chosen in the experiment platform. The figure shows the tasks done in the virtual environment for ease of presentation. The tasks and objects in the real environment station are the same. 19 There are three layers in the scene. The top red layer is the obstacle in this task, with the subjects instructed to avoid touching this layer as much as possible. The layer below the obstacle consists of two sliding blocks. Both blocks are spring loaded. The blue block (top block) can be pushed to the right and the pink block (bottom block) to the left. When the blocks are pushed, the bottom layer is uncovered. The bottom layer has a rectangular grid area where the incision marks are supposed to be made. The sequence of operation is as shown above. (a) The scene is shown in its original configuration with the tools in their starting position. (b) The left hand tool is used to push the spring-loaded blue block, uncovering the grid underneath. (c) The edge of the block, once pushed, needs to be maintained between two finely spaced lines (the red warning line and the outer edge of the rectangular grid mark). The block is then transferred from the left hand tool to the right while maintaining the position between the two lines. (d) Once the tool transfer is complete, the left hand tool becomes free. Pressing the left pedal activates the left scalpel. An incision mark is made on the first grid with the scalpel activated. The incision mark has to be as straight as possible and as long as possible, without going outside the boundaries of the grid. Also, the harder and longer the tool is pushed, the thicker the incision mark would come out. The incision mark has to be as thin and consistent in thickness as possible. All this has to be done while maintaining the correct position of the block. (e) With the first incision mark completed the block is let go and tools return to the start position. The same action from (b) to (d) is repeated for the next incision mark on the second rectangular area under the same blue block. The order of incision is from top left rectangle, bottom left rectangle, top right then bottom right. When the incisions marks are to be made on one of the top rectangles, the left tool should push the top of the block, and then the bottom of the block should be held with the right tool, so that the tools do not collide when making the incision. The same applies for when making incisions on the bottom rectangles, but vice versa. (f)-(h) Once all four incision marks are made under the blue block, the subject switches hands and performs the same tasks on the pink block. The subject returns the tools back to the start position after each incision mark. A total of eight incisions are made, four with the left and four with the right. 20 2.1.3 Implementation of Tasks With the task selected, it needed to be implemented in both the real environment test station and the virtual environment training station. For implementation in the real station, two plastic blocks were mounted on linear sliding bearings for the moving doors. The blocks were spring loaded with tissue-like material purchased from Limbs and Things, Inc. DC 750-1000 LVDT s (Linear Variance Displacement Transducers) from Macro Sensors were calibrated and fixed on to the blocks so that the displacement of the blocks could be recorded. The LVDTs sent signals to a Data Translation DT 300 A/D card, so that signal could be read, plotted and saved by the computer. The top layer of the scene was manufactured by water jetting the desired pattern from an aluminum sheet, bending to desired shape and attaching a layer of tissue-like material on the top surface. Figure 2.6: (a) The operation scene in the real environment test station. The blocks are mounted on linear sliding bearings, spring loaded and connected to LVDTs. A pen tip, with its own ink source, was installed at the end of the tool. The grid was made out of ink absorbing paper and placed underneath the sliding blocks. Therefore, the incision marks would become thicker as the user pushes harder and longer on to the grid, which was what was desired. After each test session is complete, the grid would be removed and the incision marks digitized through a scanner. The straightness, length, thickness and position of the digitized incisions could be evaluated using algorithms written with MATLAB's image processing toolbox. 21 Figure 2.7: Surgical tool with pen tip installed to the end. Implementation of the task for the virtual environment mainly involved software issues. First, all the shapes, dimensions and masses of the real environment were accurately measured so that they could be exactly simulated in the virtual environment. Traditional point-based collision detection methods were used for detecting the collision between the tool tip and the virtual objects (collision detection algorithm already developed in GHOST SDK, Sensable Technology, was used). On top of this, a ray-based collision detection algorithm was developed specifically for this task so that it could be determined when the sides of the tools hit the edge of the obstacle. Conventionally, ray-based collision detection is only used when six-degree of freedom force feedback devices are available. However, for the first time, ray-based collision detection was implemented in this setup for three degree of freedom devices. This could be done because the rubber abdominal wall acts as a pivot for the tool. (See Appendix 1 for details) The most important step in implementing the task in the VE was modeling the dynamic block. The dynamics of the block was modeled by the equation, m + fb(,t) + fk(x,t) = FT,! Tp (2.1) It could be seen from this equation that the dynamics of the block could be modeled as linear or nonlinear, time variant or time invariant, with or without damping, etc. Therefore, this equation gives a means for adjusting the fidelity of the haptic model. This makes studies of the effect of haptic fidelity possible. 2.1.4 Performance Measurements With the task selected and implemented, a set of performance metrics was required to quantify the performance of the subjects. Traditionally, the measurement of surgical skill is more qualitative than quantitative. However, for this study, a numeric measurement of performance was necessary for a more absolute evaluation of skill improvement. In fact, there have been 22 numerous prior studies done on establishing metrics for laparoscopic surgery ([13, 15, 36-40]). However, the performance measurements were relevant to particular tasks or involved establishing hidden Markov models. For this experiment platform, an original set of metrics was established to gauge performance and is shown in Table 1. Significance What is measured? Gives measurement for economy of effort and coordination. -Completion time -Inter task time Measurement of how well the subject controls the force and position of the tool tip. -Block position Measurement of how well the subject orientates, controls, and coordinates the two tools for accurate incisions. -Straightness Performance Measurement Time Push Accuracy Cut Accuracy C -Depth -Depth consistency -Accuracy General positioning and control of the tool prior to pushing and Peromncincision Tool Control -Obstacle avoidance -Positioning of tool -Positiningft Performance = s1 T + s2 P + s3 C + s 4 0 Table 1: Performance metrics The total score was a scaled, normalized sum of the above four metrics. The scales were determined by discussions with expert laparoscopic surgeons from MGH on what is important in surgery. The total performance score tried to reflect what expert surgeons defined as good surgical operation. In the end, a 1-4-3-2 weighting was given to time, cut accuracy, push accuracy and tool control, respectively. Push accuracy and cut accuracy were given the highest weighting because these were tasks where tool coordination and depth perception was important. Also these two metrics were the quantities that described the success or failure of the main objective of the tasks. Time is not an important factor in most laparoscopic procedures, since most cases are not those in emergency surgery is required. Therefore, time was given a relatively smaller weighting. 23 Before the start of each experiment the subjects were given a detailed description of the task. Also they were advised on how their performance would be scored, including the weighting of each metric. 2.2 2.2.1 Design of Validating Experiments Factors that Effect Virtual Environment Training The above section concludes the description of the experimental setup. What remains is a preliminary round of experiments to show the usefulness of the test bed. However, before any experiments could be done on the platform, we needed to consider the factors that can affect training. This was required so that it could be determined what factors needed to be experimented with, in order to clarify their role in training. Haptic device's inertia, stiffness, friction and bandwidth ... Haptic Accuracy Accuracy of the mechanical model Accuracy of the mechanical model Graphic Accuracy Photo-realistic texture Part-task vs. whole-task U Frequency and duration Figure 2.8: Factors that affect general virtual environment training As Figure 2.8 shows, in all training environments there is going to be some type of interaction between the trainer and the user. There are many factors that can affect the quality of the training from both the user side and the trainer side. Therefore, both the trainer's ability to deliver an effective mode of training and the user's ability to respond to the training are important. However, for this study, only the factors that affect training from the trainer side are under 24 consideration. There are three major factors that can affect training from the trainer side: interface fidelity, environment fidelity and training method. Interface fidelity is influenced by the accuracy of the haptic and graphic interfaces. Inaccuracies can occur from the inertia, stiffness, resolution, friction and bandwidth of the haptic interface and the resolution and refresh rate of the graphic interface, among others. Environment fidelity is determined by how accurately the real environment is modeled physically in the virtual environment. The accuracy of the environment can be divided largely into haptic accuracy and graphic accuracy. These mainly involve shape, texture, force and deflection for haptics and color, texture, shade, shape and deflection for the graphics. The third factor shown in the figure is the training method, which is general to all forms of training and not just VR training. Factors such as frequency and duration of training, type of training (part task or whole task) are categorized under training methods. Roles of all the factors noted in Figure 2.8 in training can be investigated using the experiment platform. For the first round of experiments to validate the effectiveness of this platform, experiments were done investigating two of the above factors. The effects of haptic fidelity on training transfer and the effectiveness of part task training versus whole task training were investigated. 2.2.2 Haptic Fidelity There are many aspects of haptic fidelity that could be experimented with. One of the key questions that remain about haptic fidelity is how accurately nonlinearity of material elasticity needs to be modeled. Human organs act as nonlinear springs when they are pushed or pulled. However, the actual force displacement properties are not well known. Characterizing the in-vivo force-displacement properties of organs is not a trivial problem. Therefore, the majority of surgical simulations up to this point have used simple linear elastic models. Efforts are continuously being made to characterize tissue properties accurately. However, it is not known if the user using the surgical simulator can actually tell the difference between a linear and a nonlinear spring. Even if they could, the difference in training effect may not even be significant. Therefore, for the first experiments on the platform, the effect of modeling a nonlinear elastic material to varying degrees of accuracy was investigated. 25 For this investigation a non-linear spring was loaded on to the sliding blocks in the real station. The material used for the spring was tissue-like rubber material purchased from Limbs & Things, Inc. The force-displacement data for the spring and block assembly was measured using standard weights on a low friction pulley and is plotted on Figure 2.9. 1.6 1.4 1.2 1 z 0.8 0 u 0.6 0.4 0.2 0 0 15 10 5 30 25 20 40 35 x (mm) Force displacement plot for rubber spring material Figure 2.9: The figure shows a typical behavior observed in most tissue material where the stiffness increases as the displacement increases. For the fidelity study, two approximations were made for this curve. A linear approximation and nonlinear approximation was made using a least-squares curve fitting technique. 1.6 1.6 - 1.4 1.4 1.2- 1.2 y =0. 1391 ei R2= 0.9973 1. 0 LL 0.6 0.6 0.4 08 0.4- 1 0.2 0.2 0 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 X (nirn) x (-4 (a) (b) (a) Linear approximation to force displacement curve, (b) Nonlinear (exponential) Figure 2.10: approximation to force displacement curve. The order of magnitude of the stiffness is in agreement with what has been measured experimentally for a pig's esophagus (0.005N/mm - 0.02N/mm). 26 With the above approximations, three different levels of haptic fidelity were possible for modeling the spring in the training station. The nonlinear approximation was the most accurate high fidelity model, the linear approximation was the medium fidelity model and the lowest fidelity was a model without force feedback. For the haptic-less model, the block would only move graphically as it is being pushed. Comparison between the training effects of these three models would give insight into the role of haptic fidelity in training. There were some predictions that could be made from prior knowledge even before experiments were done. For the sensory resolution of the hand, the JND (Just Noticeable Difference) for force varies from 5-15% depending on the force magnitude, muscle system and experiment method [41]. The resolution deteriorates at forces below 0.5N, with a minimum resolution of 0.06N at these small forces [42]. The maximum force difference in the operating region between the linear and nonlinear approximation is 0.16N. This difference would be amplified three to four times at the hands due to the long tool that is pivoted. Also, the stiffness difference between the two models varies from 0-70% in the operating region. The JND for stiffness is known to be 23% [43]. Therefore, if a simple discrimination experiment were being done, the subject would most likely be able to discriminate between the nonlinear and linear elastic model. However, whether this small difference has any effect on the training is another matter. That is why it is worth investigating. Initial predictions are that the training would not be affected by the small difference in linearity. This is because there are additional forces on the tool from the friction of the haptic device and the abdominal wall that may be much larger than the contact forces. Also, the subjects are concerned with performing the task rather than trying to discriminate between the two models. 2.2.3 Part Task vs. Whole Task Training A second experiment comparing the effectiveness of part task training with whole task training was done in parallel with the above haptic fidelity experiment. Part task training is when training is done one sub-task at a time, where the desired complete set of tasks is a combination of the sub-tasks. In fact, part task training is used in many fields where training is commonly required, such as in sports, aviation and surgery. It has been found that part task training is especially effective for training beginners with no prior skill. Experiments were done to determine if this is also true for laparoscopic surgery training. 27 The main tasks described in section 2.1.2 were divided into four part tasks; positioning task, pushing task, cutting task and obstacle avoidance. The subjects would train on one of each part task at time. It was important to design the part task training sessions such that the total amount of training received, in terms of the actions needed and time taken were the same as in whole task training. Otherwise a valid comparison could not be made between the effects of part task and whole task training. The positioning task consists of simply positioning the tool tip at desired locations. The obstacle avoidance task involved placing the tools in a position to push the blocks, without touching the upper layer. The pushing task was composed of pushing the block to the desired location and maintaining the location while transferring the tool as done in the whole task. Finally, the cutting task was simply making incision marks on the grid as straight, long, thin and consistent as possible. At the end of the training the subjects would be evaluated on the whole task in the real environment and not the part tasks. 2.3 Experimental Procedure Twenty subjects with no prior surgical training were used for the first round of experiments. The subjects were given the same detailed description of the tasks and scoring. Once they consented to the experiment and the briefing was complete, they were initially evaluated at the test station performing the described tasks. The subjects were then divided into five initial skill level groups depending on their initial performance score. One subject from each skill level group was treated by one of the four training treatments. This set up a 5 x 4 matrix shown in Table 2, with which a two factor analysis could be done with the results. The analysis would work assuming that the two factors that affect the training performance are the initial skill level and the type of trainer used. Type of Trainer 3 Nonlinear Haptics Linear Haptics No Haptics Part Task 1 NI Ll NHI P1 2 N2 L2 NH2 P2 3 N3 L3 NH3 P3 4 N4 L4 NH4 P4 5 N5 L5 NH5 P5 Experimental design table showing how the subjects were divided and trained. N1, Table 2: N2, etc. represents the subjects. 28 The training for each subject lasted seven sessions. Each session involved completing the task on the virtual trainer from the beginning to the end. For the part task training, a session consisted of going through each part task exercise once. Once the training was complete the subjects were brought back to the real environment test station to be evaluated for the final time. The measure of training effectiveness was determined by the improvement in skill between the initial evaluation at the test station and the final evaluation, also on the test station. Training _ Effect = perff - perfn (2.2) The details of the tool path, block position, time to completion and incision marks during the training and testing sessions were recorded as described in the previous sections. The data was then used to calculate the performance scores needed for evaluating the results. 29 3 Experimental Results In this chapter, the results from the first round of validating experiments are presented. The results are from 20 subjects who were divided to 4 training treatment groups. Each group was trained with either one of the three levels haptic fidelity, or by part task training. 3.1 Results from Testing Station Figure 3.1 shows the data from the real environment testing station. The data was recorded for the initial skill level test and final test before and after training for each subject. The data required from the test station was the block position, incision mark accuracy, time taken and the number of obstacles hit. 30 (d) Time : 236.9 (e) Obstacle hit : 25 (a) (b) (d) Time: 136.2 -- -- (e) Obstacle hit: 9 (b) (a) Figure 3.1: Sample of raw data from the real environment test station for subject P3, before and after training. The data shows, (a) profile of block position, (b) incisions marks, (c) time and (d) number of obstacles hit. The graphs in part (a) of figure 3.1 are the profile of the block position. The two green (horizontal) lines mark the boundaries in which the block was supposed to be maintained. The scanned image in part (b) shows the digitized incision marks on the grid. 31 3.2 Total Score and Skill Improvement The total scores before and after training were calculated from the raw data using the metrics described in section 2.1.4. Table 3.1 is a sample table showing the scores for the five subjects who received training with the nonlinear elastic model. The score for pushing was calculated by integrating the area of the block profile that lay outside the desired region. Also, the number of times the block slipped off the tool was added to the push accuracy score. The cut accuracy score also had several components. The straightness was measured by the standard deviation of the center of the incision. The mean thickness and thickness consistency (standard deviation) was also determined. These values were combined with the length of the line and the number of pixels outside the rectangular boundary to produce the total score for cut accuracy. Each sub-score was normalized so that the inner 75 percentile would be between 1 and 0. The sub-scores were weighted, as described previously, and then summed to produce the overall score. A lower score signified a better performance in this scoring system. 32 Subject N1 N3 N2 N4 Time Inter-Task Time Task Time normalized 312 246 70.7 97.2 175.3 214.8 0.875093 1.368736 174.7 217.1 62.2 67.4 112.5 149.7 0.34181 0.658938 Cut Accuracy Straightness Thickness Consistency Thickness Deviation Length Outside the Lines total normalized 1.56685 0.96285 1.18875 1.9424 1.319213 1.327175 2.0388 0.849038 1.638375 19 26 24 3.75 7.125 7.375 2.147473 3.626597 6.794513 0.30439 0.71962 0.110516 Push Accuracy cross red cross back loss block total normalized 14.9029 38.7822 1 81.13655 0.014184 Tool Control Obstacle avoidance missed combo total normalized 17 20 27 23 2 1 1 1 22 32 33 25 0.551724 0.793103 0.827586 0.448276 N5 262.7 109.9 152.8 1 1.303738 2.097488 3.3316 2.043063 3.293813 4.184988 8 53 16.5 12.5 7.120025 15.93445 0.762286 1.917625 34.4 76.88 4.7667 132.5542 496.12 170.82 246.17 528.2261 1 0 1 0 242.42 361.49 555.3762 694.9513 1 0.273213 0.464445 0.775836 TOTAL 33 33 1 38 1 Final Skill Level Test Subject N1 N3 N2 N4 N5 133 159.5 52.1 30.2 107.4 102.8 0.029918 0.228123 Time Inter-Task Time Task Time normalized 172.3 149.1 52.3 54 118.3 96.8 0.150337 0.323859 121.2 40 81.2 -0.05834 Cut Accuracy Straightness Thickness Consistency Thickness Deviation Length Outside the Lines total normalized 1.1255 1.711025 1.20145 9 3.125 2.867351 0.204873 1.0946 0.681713 1.115688 1.36415 1.549738 1.283688 0.78315 1.262425 1.021125 3 24 4 1.375 13.5 1.625 2.696786 0.013529 3.143664 0.182516 -0.169188 0.24109 Push Accuracy cross red cross back loss block total normalized Tool Control Obstacle avoidance missed combo total normalized 1.013763 1.987863 1.576538 8 8.875 4.213987 0.381381 0.2388 180.75 0 181.1082 0.174743 0.2078 296.6045 192.8802 145.2465 2 0.5 203.1919 630.1533 0.210211 0.895931 8 15 13 0 0 0 15 13 8 0.206897 0.137931 -0.034483 19 17 0 1 17 24 0.275862 0.517241 9.2177 5.628 9.5415 5.8413 0 0 23.36805 14.2833 -0.078595 -0.093186 TOTAL Sample scoring table for evaluation test conducted on the real environment test Table 3: station before and after training using the non-linear elastic model. (For 5 subjects.) Figure 3.2 shows a bar chart of total scores, before and after training, as calculated in the table in Table 3. As mentioned previously, a smaller bar represents a better performance. 34 Figure 3.2: Bar chart of the total performance score before after training using the nonlinear elastic model. The results from the other trainers, namely the two lower fidelity trainers and the part task trainer, were evaluated in the same manner described above. A scoring table identical to that in Table 3.1 was constructed from the raw data for the other three trainers. The total score was again plotted as bar charts, shown in Figure 3.3. (a) Linear Haptic Model 35 (b) No Haptics (c) Part Task Training Bar chart of the total score before and after training for the (a) linear haptic trainer, Figure 3.3: (b) no haptic trainer, (c) part task trainer. As mentioned in the previous chapter, the measurement of training effectiveness was calculated by subtracting the initial performance score from the final performance score. This improvement in skill level is shown in Table 4 for each subject. Training _ Effect = perfn 36 - perfiitiai (3.1) Type of Trainer 2 Table 4: Nonlinear Haptics Linear Haptics No Haptics Part Task 1 1.502 -0.214 -0.154 1.267 2 3.590 2.297 3.937 4.777 3 4.894 3.173 -0.091 4.324 4 6.031 4.735 3.712 4.209 5 7.183 7.692 6.561 10.356 Improvement in performance score (trainingeffect) after virtual reality training. 3.3 Learning Curve for Training The performance score could also be measured during the training sessions on the virtual trainer. By plotting the performance for each of 7 training sessions, a learning curve could be charted for each subject. Figures 3.4-3.6 show the learning curve for the 15 subjects who received whole task training. The learning curve for the 5 subjects who underwent part task training is not shown here, because there are no grounds for comparison between scores from part task operations and the scores from the whole task operations shown. 37 Figure 3.4: VR trainer. Learning curves for subjects who received training on the nonlinear elastic model 38 Figure 3.5: trainer. Learning curves for subjects who received training on the linear elastic model VR 39 Figure 3.6: haptics. Learning curves for subjects who received training on the VR trainer without 40 4 Analysis and Discussions Analysis of the results shown in the previous chapter was carried out to see how each training treatment effected the performance improvement. Initially general observations were made from the average improvement scores in each training group. Then, a statistical analysis was done to see if the difference in training effect was statistically significant. Finally, the learning curves from the training sessions are also discussed in the last step of the analysis. The analysis will show that the number of subjects as originally planned was not enough to provide very definite conclusions. Therefore, addition experiments were done as an extension to the first round of experiments and the combined results were more conclusive, as will be seen at the end of this chapter. 4.1 Training Effectiveness The average improvement in performance was calculated for each of the four training treatment groups. The results are shown in Figure 4.1, from left to right, for the nonlinear elastic (high fidelity), linear elastic model, no haptics and part-task training. 41 Figure 4.1: Average training effectiveness plot The y-axis in the above figure shows the improvement in performance score, and each group of bars on the x-axis represent the training method. The tallest bar on the right end of each group represents the average improvement in the total score. The smaller bars to the left of the total show the improvement in each sub-score (time, pushing, cutting and obstacle avoidance). The total is a sum of the sub-metric bars to its left (The sub-metrics were plotted in their weighted form). The error bars on the chart represent the standard deviation of the scores. On first impression, the standard deviation appears to be very large. However, the experiment was designed with two factors analysis in mind. The standard deviation as plotted on this bar chart is expected to be large, because for each training treatment, subjects of five different initial skill level groups were used. Moreover, it was previously assumed that the initial skill level effects the improvement in skill. This was taken into account later in the two factor statistical analysis. Therefore, for the time being, the error bars can be ignored for the discussion of the mean values. 42 One obvious trend that can be seen in Figure 4.1 is that as fidelity decreased, so did the improvement in total skill level. This supports the assertion that haptic fidelity, more specifically the fidelity in modeling nonlinear stiffness, plays an important role in laparoscopic training. In fact, not only is there a difference between training with haptics compared to without, there is also a notable difference between the training effect of the nonlinear model and the linear approximation. In the previous chapter it was predicted that the subject might be able to barely discriminate between a linear and nonlinear spring of this magnitude, but the difference in training effect would be negligible. However, the results seem to be showing otherwise. Not only did the subjects seem to be able to distinguish between the two models, the training effect of the two models seemed to differ. On the other hand, if the results are studied more closely, there are signs that the difference between the linear and nonlinear models is merely coincidental and more likely due to variance among subjects. Such an argument is supported by the fact that the bar charts for the sub-scores show the main area where there was a difference was in the cutting accuracy. However, the cutting task was modeled exactly the same for both trainers. Therefore, two arguments could be made. The difference in score may simply be due to the sample variance, or it could be that the spring modeling had an indirect effect on the cutting because the springloaded block had to be held while making the incision. Further insight can be obtained from the statistical analysis that will be shown later in the next section. Another observation that can be made from Figure 4.1 is that the total skill improvement for the part-task training is larger than those for the whole task training. The spring in the part-task trainer was modeled with the nonlinear fit. Therefore, the only direct comparison that can be made is between the nonlinear whole-task training (first set of bar charts) and part task training (last set of bar charts). Although, the total score improvement was greater for part task training, the difference between the two was not significant as will be shown later in the statistical analysis. However, what is interesting is that improvement in obstacle avoidance was notably greater in part-task training than in any of the whole-task training results. This could be attributed to the fact that when performing the total combination of tasks the subjects are concentrating mainly on controlling the block and making an accurate incision and care less about avoiding the obstacle. On the other hand, in part-task training, there is a training session solely devoted to avoiding obstacles and thus the subjects become more accustomed to obstacle avoidance. Whatever the case, it can be said that part-task training was an effective method for training for these set of tasks. Especially since the training effect was greater or equivalent to whole-task 43 training even though the subjects were not trained even once on the complete combination of tasks. There were also some notable trends that can be seen from training without haptics. The average improvement in total score was smaller for those who trained without haptics, as expected. The score improvements for the cutting accuracy were especially smaller than in other training groups. This was probably due to the fact that the cutting task required the most precise tool control and haptic cues during training were useful for the subjects in determining depth and controlling the tool tip. Another interesting result from the training without haptics can be seen in Figure 4.2 Figure 4.2: Performance improvement shown with initial skill level group The above figure shows the total score improvement plotted for each subject, with the number above each data point representing their initial skill level group. Therefore, a lower number means that the subject performed better in their initial test. A very structured trend can be seen from the plot for the nonlinear and linear haptic training. The subjects who initially had the least amount of initial skill showed the most improvement, and the initially most skilled subject showed the least amount of improvement. The subjects in between are spaced fairly equally in perfect order. However, for the training without haptics, the subjects are clustered here are there with random order. A similar effect can be seen for part task training. What this shows is that the 44 subjects responded to the training to varying degrees. For example, some subjects were not able to perform the task at all when there was no haptic feedback in the virtual environment. These subjects may continue their training without much improvement and this could be seen in the some of the learning curves shown in chapter 3. On the other hand, some subjects may initially start off struggling on the trainer without force feedback but steadily master the technique as the number of training session increases. Therefore, for those who were able to perform the task well without haptics, performing the task in the real environment test station became much easier in comparison. Subjects NH3, NH2 and NH5 from Figure 4.2 are examples of those who were able to fairly master the task without force feedback, and subjects NHI and NH4 were those who struggled through out. Similarly for the part task trainer, some subjects were able to perform the combination of task well from training each part task separately, and some were not. It depended on how well the subject was able translate the skill acquired for the part tasks into skills needed for performing the whole combination of tasks. Ideally, a trainer needs to be consistent in being able to train all subjects. Therefore, a whole-task trainer with force feedback proved to be better in that aspect. 4.2 Statistical Analysis It has been speculated above that the different training treatments had different training effects. However, for any of the conclusions to be mathematically valid, a statistical analysis of the results had to be done. If the means do not differ by much and the standard deviations are large, it cannot be concluded that the difference in the means are due to the trainer and not the sample variation. A two-factor ANOVA (analysis of variance) test was conducted on the results. As previously mentioned, it was assumed that the increase in performance was influenced by two factors. The factors were the training treatment received and the initial skill level of the subject. Thus, the assumed relationship for the total performance increase is shown by the following equation. yk = /p+ ai + i + Y ,,k (4.1) Yijk is the performance increase for a subject from the initial skill level group i, who underwent training treatmentj and has repetition index k. u is the common effect, a,is the training effect of trainer i, Bj is the effect of the subjects initial skill level group j, factors a and # , and c yij is the interaction between is the uncontrolled variation for this specific subject. Using this 45 relationship two hypotheses can be set up. The first hypothesis is that the effects of all four training treatments were the same. HO: a] =a 2 = =a a3 (4.2) 4 The second hypothesis is that the effects of the five initial skill level groups were the same. H, :A= #02 = 03 = (4.3) 4 The main interest is to show that first null hypothesis can be rejected at a significant confidence level. For this analysis, Table 4 shown in the previous chapter can be used. Type of Trainer (a) Nonlinear Haptics Linear Haptics No Haptics Part Task 1 1.502 -0.214 -0.154 1.267 2 3.590 2.297 3.937 4.777 3 4.894 3.173 -0.091 4.324 4 6.031 4.735 3.712 4.209 5 7.183 7.692 5.578 10.356 () 2 Table 4: Improvement in performance score after virtual reality training. For the two factor ANOVA test, the following sum of squares (SS), mean squares (MS) and degree of freedoms (d) needed to be calculated. Source df SS MS aaa b (colu a-1 nns) Za SSa = bn 2 a Z b-1 (rows) SS/ y (a-1)(b- 1) j2 an a MSa = SSa / df Zzzyk abn b / 2 n 1 n b 1 Yik rZZ abn b ILV7 SSy=1 j1 a ab(n-1) b Y Z -SSac-SSP=+ n Error MS/i = SS 3/df 2 n abn MSy=SSy/df ;*j abn SSE = (SST - SSa - SS 46 - SSy) MSE = SSE / df Total Abn-l SST i=1 i=1 j= k=1 7 2 n 2 n b a abn =1 k=I Table 5: ANNOVA table calculations for two factor factorial test with n replicates of each treatment combination [441. n=1, a=4 and b=5 for this experiment. With the above calculation, the test statistic (F-statistic) could be found for each effect. F = MSa / MSE for factor a (4.4) F=MS8 / MSE for factor )6 (4.5) From the statistic and degree of freedom, the confidence level for rejecting the null hypothesis could be found. The calculations were done for the experiment results obtained in Table 4 using MATLAB's statistics toolbox. The results of the ANOVA calculations are shown below. Sounrc-e Rows Error Total Table 6: SS 109.383 21.581 146.239 df 4 12 19 MS 27.3458 1.7984 F Prob>F 15.21 0.0001 ANOVA table of results The second hypothesis, H,, could be easily rejected at 5% significance level (p=0.001<0.05). Therefore, it can be concluded that the initial skill level had a significant effect on the training effectiveness. However, the null hypothesis, HO , which was of more interest, could not be rejected at 5% significance level (p=0.0832>0.05). Although, there was a notable difference in mean as shown in Figure 4.1, the variance was unfortunately too large for the differences to be statistically significant. Large variations are very in common in most human factor studies and such results were somewhat expected in this experiments since the sample size was fairly small. This was accentuated by the fact that a two factor test requires a division of subjects into a 4 by 5 matrix. Therefore, although 20 subjects were used, there was only one subject per each cell. 47 However, the fact that the statistics are not conclusive does not make the observations made in the previous section invalid. There were clear trends that could be noted, such as higher fidelity models producing by better average improvement. What are needed are more subjects. If in fact the observations made in the previous sections were correct representations of the whole population, a larger sample size would decrease the variance and make the conclusions statistically valid. Thus, experiments were done with 12 additional subjects and the results of the final analysis will be shown in a later section. A pair-wise comparison using a Tukey test for the four training treatments was also conducted. The training effect of the trainer was denoted as t, and the Tukey test was done to show if each training treatment was significantly different from one another. The results are shown in Table 7. Tnonlinear #linear 7nonlinear # Tno-haptics Vnonlinear parttask Thnear nohaptics Probability Table 7: Tukey pair-wise comparison The comparisons show that none of the training effects can be concluded to be different from another at 5% significance level. This was partly expected since the ANOVA test did not show a significant enough difference. However, the same test with more subjects showed a much better result and will be presented later. 4.3 Learning Curve Figure 4.3 shows the average learning curve for the individual learning curves shown in section 3.3. 48 Average learning curve for training on the VR trainer. Errorbars show standard Figure 4.3: deviation among subjects. The learning curves for the linear elastic haptic model and nonlinear model were virtually identical. In fact, there is no reason why the curves should be dissimilar since subjects are performing the same tasks with the only difference being the model of the spring. The magnitudes of the force exerted by the two spring models were approximately the same in the operating region, so whether the spring is linear or nonlinear should not matter in the learning curve if the subject was trained solely on one spring. (The performance of the subjects trained on the two different spring model might not be the same when they are evaluated on the real environment station. That would be an entirely different matter since the subject would be evaluated on a nonlinear spring and the performance might differ depending on whether or not they were trained on a linear or nonlinear elastic model.) The learning curve shown for the two haptic models have the conventional shape seen in most learning curves. The curve is steep in the first few trials and it levels off towards the end as the learning saturates. It is plainly obvious from the curves there was training taken place within the virtual environment. There is a significant increase in 49 performance comparing beginning to end. However, as mentioned in the first chapter, how this training transfers to the real environment needs to be measured separately. The learning curve for the training without haptics was somewhat different. The overall performance score was at a significantly lower level than the other two curves. Also, the curve was more erratic with less of a steady upward trend. This shows how difficult it was to perform the tasks in the virtual environment without force feedback. Also, the response to the training without haptics varied widely among subjects, shown by the large error bars. Some subjects were able to gain the gist of how to control the tools without haptics within a few sessions and others struggled throughout from the start to the end of their training sessions. This resulted in the training effect having a high variance as discussed previously. 4.4 Results from Further Subjects Further experiments were conducted with 9 more subjects in an attempt to obtain better results. Although the initial round of experiments showed some encouraging results, the statistical analysis showed that the data was not entirely satisfactory in making clear-cut conclusions. The results obtained from the additional subjects are presented here in a separate section because these experiments were conducted after the initial analysis of the results and were not part of the original experimental design. The 9 subjects were initially evaluated at the real environment test station. The subjects were then divided into three training treatment groups of nonlinear haptics, linear haptics and no haptics. None of the subjects were treated with part task training because the first round results already showed that part task training was an effective method of training. Therefore, the additional subjects were used to solely to determine the difference in training effectiveness of the three haptic models. The experimental methods were exactly the same as those described in the previous chapters. For the statistical analysis, the subjects were divided into eight initial skill level groups so that it was possible to have one subject per each group for the three training treatments. A total of 24 subjects were needed for this matrix. The results for the 15 subjects from the first round were kept and the additional 9 results were combined to give results for 24 subjects. 50 The final results of the experiments are shown in Table 8 and Figure 4.4. (These are in the same format as the results from the first round) Type of Trainer (a) Nonlinear Haptics Linear Haptics No Haptics 1 1.502 2.709 -0.515 2 1.507 -0.214 -0.496 "W 3 3.590 2.297 3.263 0 4 5.203 6.198 2.164 C 5 4.894 3.173 -0.091 -4 6 6.031 4.734 3.712 7 7.183 7.692 4.026 8 13.625 13.345 6.5609 5.4420 4.9917 2.5947 mean Improvement in performance score after virtual reality training (Revised). Table 8: 7 Tukey Interval Total Time Cut Accuarcy Push Accuracy Obstacle Avoidance 6 E5 V 0 '-4 =3 M F- w2 CL 1 II 0 Figure 4.4: I- - Nonlinear II F]- Linear Training Model Average training effectiveness plot (Revised) 51 - .. I No Haptics H The difference in the total score improvement between the haptic training modes and the training without haptics was more noticeable than before. Another notable change was that there is no significant difference between the score improvements for the nonlinear elastic and linear elastic models. The improvement in each sub-metric score was comparable for the two haptic training modes and again cutting and pushing accuracy seemed to deteriorate the most among the subscores when force feedback was absent. In Figure 4.4, error bars represent the interval for which a Tukey pair-wise comparison would show significant difference between the means. Therefore, if the intervals are disjoint it can be concluded that the training effects are different at 5% significant level. Showing this statistic as the error bar seemed to be more appropriate than plotting the standard deviation. The standard deviation was somewhat meaningless because only one factor was plotted on the x-axis for performance increase that was affected by two factors. Two factor ANOVA test was conducted on the results shown in Table 8. This time the null hypothesis (hypothesis that the training effects are the same), HO : ai = a 2 = a3 = a 4 (4.2) could be easily rejected at 5% significance (p=0.026). The pair-wise comparison showed improved results also, Tnonlinear Probability Table 9: Vlinear Tnonlinear = no _haptics Vlinear = Vno - haptics 0.003 0.81 0.011 Tukey pair-wise comparison (Revised) The comparison shows that the training effect of the trainer without haptics was significantly lower than that of the trainers with force feedback. Also, there was no significant difference between the training transfer of the nonlinear elastic model and the linear elastic model. Therefore, as predicted in chapter 2, the accuracy of the elastic model did not appear to affect the training performance for a bimanual pushing task. However, force feedback did appear to be essential in effective training for surgical tasks in the virtual environment. 52 5 Conclusions and Future Work 5.1 Conclusions The overall contribution of this thesis can be divided into two parts; the design of the experiment platform and the results from the initial rounds of experiments conducted on the platform. The experiments yielded several conclusions. First of all, it was proved that positive training transfer can occur from virtual environment training for laparoscopic operations. This may seem like a trivial conclusion, but it is in fact an important one. This is the first time it has been shown that real environment surgical skills can be improved through virtual environment training alone. Such a conclusion could be made because the improvement in skill was a measure performance in the real environment and not a measurement of virtual world skills. The next general conclusion that can be made is that force feedback appears to be important for effective surgical training. The skill level improvement for the trainers with haptic feedback was significantly higher than that of the trainer without haptics. There are currently many surgical simulations that rely solely on graphic feedback without haptics. These systems have their advantages such as simplicity of hardware, slower computation time requirements and costs. Also, the results seem to show that a virtual trainer without force feedback would in fact give a positive training transfer on average. However, there is no doubt that haptics enhances the training effect considerably and that the performance is improved on a more consistent basis. Also, training with force feedback seemed to become more important for surgical task that require more accuracy and delicate tool control. Therefore it is safe to say that an effective surgical simulation is one that provides both graphic and haptic feedback. The training effect of the linear elastic and the nonlinear elastic model were not significantly different for these tasks. This suggests that modeling the nonlinear elasticity of tissue for bimanual pushing tasks is not important. A simple linear approximation of the stiffness seems to result in training effects that are not significantly reduced, for stiffnesses that are comparable to the organ tissue stiffnesses. Therefore, it seems accurate characterization of in-vivo forcedisplacement properties for implementation of nonlinear models is not necessary in training for surgical pushing and pulling tasks. Due to the hardware limitations, the additional forces on the surgical tools from the friction and inertia of the haptic interface and the resistive forces of the 53 abdominal wall are too large for the small differences in the stiffness to have any significant effect on training. Finally, part task training proved to be an equally effective method for training for simple surgical tasks. Therefore, training with part tasks should be considered as an viable alternative to whole task methods for entry-level training. Obviously, the effectiveness of part task training will decline as the whole task becomes more complicated. Therefore, there are limits to what part task training as a stand-alone training device can achieve. However, part task training as a precursor to real environment whole task training may be effective. In many cases, creating an entire environment modeling human anatomy for a complete set of surgical tasks is challenging. However, modeling parts of organs where specific part tasks are to be performed is a much simpler undertaking. Therefore, developing part task trainers may be a way to maximize training effect under the technical limitations that developers are faced with currently. The preliminary results are promising. However, the more important contribution of thesis up to this point is in the design of the experimental platform. Through the design of this setup, a tool is now available for measuring training effectiveness of virtual environment simulation in terms of real world skills. It may seem that the significance of such measurements are debatable since the objects in the simulation are not human organs and the tasks are not real surgical procedures. However, the intention of the platform is not in training subjects to perform real surgery or to measure how much the subject's surgical skill has improved. The purpose of the setup is to provide a test bed for adjusting various parameters within the virtual environment to observe their effect in training transfer. The setup appears to serve that purpose well. Now, various training experiments can be done on this experiment platform to answer questions about the fidelity of the simulation required and the training methods that are the most suitable for laparoscopic VR training. Thus this platform has provided "the missing link " between the real and virtual environment for which its absence has been the stumbling block for VR training effectiveness studies in the past. 5.2 Future Work First priority is given to extending the initial study by experimenting with even more levels of elastic fidelity. It would be interesting to see at what level of stiffness the training effect starts to deteriorate. This would give insight into whether or not the accuracy of modeling elasticity is important and to what degree, given the limitations of the haptic interfaces that are used. 54 The experiments that were conducted with modeling nonlinear elasticity and part task versus whole training are first of many experiments that can be done on this platform. Other aspects of haptic fidelity can be investigated, such as viscoelasticity, damping, and simulation of wet surfaces among many others. Also experiments can be done with the fidelity of the graphics. Photo-realistic texture, realistic shading and glistening could be implemented to various degrees of accuracy in order to investigate their effect on training through this platform. Results from such further experiments would be used to set guidelines for the fidelity required to achieve a particular level of training. These guidelines would be useful for developers designing surgical simulators in determining the degree of fidelity they need to aim for in designing an effective simulator. This would allow designers to focus their resources without wasting time and effort in striving to make the simulation as realistic as possible. The results from the experiments performed on this platform will serve as the basis for future surgical simulation at the Touchlab also. An effective surgical simulation development is one of the ultimate goals at the lab. The general procedure for surgical simulation development is to measure in-vivo material properties of human organs and establish a suitable model of the organ. The model would then be incorporated into the surgical simulator setup similar to the hardware shown in this study. The last step would be to test the simulator with subjects to validate the training transfer. This process is already underway at the lab for a simulation of Heller's myotomy. 55 References [1] I. Hunter, T. Doukoglu, S. Lafontaine, P. Charatte, L. Jones, M. Sagar, G. Mallison, and P. Hunter, "A teleoperated microsurgical robot and associated virtual environment for eye surgery," Presence, vol. 2, pp. 265-280, 1993. [2] D. Ota, B. Loftin, T. Saito, R. Lea, and J. Keller, "Virtual reality in surgical education," Computters in Biology andMedicine, vol. 25, pp. 127-137, 1995. [3] J. Rosen, A. Lasko-Harvill, and R. Satava, "Virtual reality and surgery," in ComputerIntegrates Surgery: Technology and ClinicalApplications, R. Taylor, S. Lavellee, and G. Burdea, Eds.: The MIT Press, 1996, pp. 231-243. [4] R. Satava, "Virtual Reality Surgical Simulator: The First Steps," Journal of Surgical Endoscopy, vol. 7, pp. 203-205, 1993. [5] S. Delp, J. Loan, M. Hoy, F. Zajac, E. Topp, and J. Rosen, "An interactive graphic-based model of the lower extremity to study orthapedic surgical procedures," IEEE Transactionson BiomedicalEngineering,vol. 37, pp. 757-767, 1990. [6] U. Kuhnapfel, C. Kuhn, M. Hubner, H. Krumm, H. Maab, and B. Neisius, "The Karlsruhe Endoscopic Surgery Trainer as an example for Virtual Reality in Medical Education," Minimally Invasive Therapy and Alliea Techologies (MITAT), vol. 6, pp. 122-125, 1997. [7] C. Basdogan, C. Ho, and M. A. Srinivasan, "Force Interaction in Laparoscopic Simulation Haptic Rendering of Soft Tissue," presented at Medicine Meets Virtual Reality, 1998. [8] S. Cotin, H. Delingette, and N. Ayache, "Real-time elastic deformations of soft tissue for surgery simulation," IEEE Trans. On Visualization and computer graphics, vol. 5, pp. 62-73, 1999. [9] M. Bro-Nielsen, "Finite Element Modeling in Surgery Simulation," Proceedingof IEEE, vol. 86, pp. 490-503, 1998. [10] S. Pieper, J. Rosen, and D. Zeltzer, "Interactive Graphics for Plastic Surgery : A Task Level Analysis and Implementation," Proceeding of Computer Graphics, pp. 127-134, 1992. [11] S. De, J. Kim, and M. A. Srinivasan, "A Meshless Numerical Technique for Physically Based Real Time Medical Simulations," presented at Medicine Meets Virtual Reality, 2001. [12] F. Tendick, M. C. Cavusglu, and e. al., "A Virtual Environment Testbed for Training Laparoscopc Surgical Skill," Presence, vol. 9, pp. 236-255, 2000. [13] A. Derossis, G. Fried, M. Abrahamowicz, H. Sigman, J. Barkun, and J. Meakins, "Development of a Model for Training and Evaluation of Laparoscopic Skills," The American JournalofSurgery, vol. 175, pp. 482-487, 1998. [14] C. Sutton, R. McCloy, A. Middlebrook, P. Chater, M. Wilson, and R. Stone, "A laparoscopic Surgery Procedures Trainer and Evaluator," presented at Medicine Meets Virtual Reality, 1997. [15] S. Payandeh, A. Lomax, J. Dill, C. Mackenzie, and C. Cao, "On Defining Metrics for Assesing Laparoscopic Surgical Skills in a Virtual Training Environment," presented at Medicine Meets Virtual Reality, 2002. [16] R. Adams, D. Klowden, and B. Hannaford, "Virtual Training for a Manual Assembly Task," Haptics-e, vol. 2, 2001. [17] J. Kozak, P. Hancock, E. Arthur, and S. Chrysler, "Transfer of training from virtual reality," Ergonomics,vol. 36, pp. 777-784, 1993. [18] B. Witmer, J. Bailey, B. Knerr, and K. Parsons, "Virtual spaces and real world places: Transfer of route knowledge," InternationalJournal of Human-Computer Studies, vol. 45, pp. 413-428, 1996. [19] J. Bliss, P. Tidwell, and M. Guest, "The effectiveness of virtual reality for administrating spatial navigation training for firefighters," Presence, vol. 6, pp. 73-86, 1997. [20] R. Boian, A. Sharma, C. Han, A. Merians, G. Burdea, S. Adamovich, M. Recce, M.Termaine, and H. Poizner, "Virtual Reality-Based Post-Stroke Hand Rehabilitation," presented at Medicine Meets Virtual Reality, 2002. [21] T. Carretta and R. Dunlap, "Transfer of effectiveness in flight simulation: 1986 to 1997," : Air Force Research Laboratory, NTIS, 1998. [22] G. Lintern, S. Roscoe, J. Koonce, and L. Segal, "Transfer of landing skills in beginning flight training," Human Factors,vol. 32, pp. 319-327, 1990. [23] A. G. f. A. R. a. Development, "Fidelity of simulation for pilot training," NATO 1980. [24] D. Kurts and C. Gainer, "The Use of a Dedicated Testbed to Evaluate Simulator Training Effectiveness," . [25] C. Wagner, N. Stylopoulos, and R. Howe, "The Role of Force Feeback In Surgery: Analysis of Blunt Dissection," presented at 10th Annual Haptic Symposium, Orlando, 2002. [26] C. Lathan, M. Tracey, M. Sebrechts, D. Clawson, and G. Higgins, "Using Virtual Environments as Training Simulators: Measuring Transfer," in Handbook of Virtual Environments, K. Stanney, Ed.: Lawrence Erlbaum Associates, 2002. [27] N. Taffinder, C. Sutton, R. Fishwick, I. MacManus, and A. Darzi, "Validation of Virtual Reality To Teach and Assess Psychomotor Skills in Laparoscopic Surgery: Results from Randomised Controlled Studies Using the MIST VR Laparoscopic Simulator," presented at Medicine Meets Virtual Reality, 1998. [28] A. Chaudhry, C. Sutton, J. Wood, R. Stone, and R. McCloy, "Learning rate for laparoscopic surgical skills on MIST VR, a virtual reality simulator: quality of humancomputer interface," Ann R Coll Surg Engl, vol. 81, pp. 281-286, 1999. [29] G. Chubb and P. Macy, "Microsoft Flight Simulator Suitability for Cross Country Exercises for Private Pilot Training," presented at AIAA Modeling and Simulation Technologies Conference, 1997. [30] G. Anderson, "A method for aircraft simulation verification and validation developed at the United States Air Force flight simulation facility," presented at AGARD, Flight Simulation, 1986. [31] M. Bonner and D. Gingras, "Evaluation of the Navy's F/A-18 A/D Powered Approach Aerodynamics Model," presented at AIAA Modeling and Simulation Technologies Conference, New Orleans, LA, 97. [32] J. Rolfe and K. Staples, "Flight Simulation," in CambridgeAerospace Series. Cambridge: Cambridge University Press, 1986. [33] D. Kurts and C. Gainer, "The Use of a Dedicated Testbed to Evaluate Simulator Training Effectiveness," presented at AGARD, Piloted Simulation Effectiveness, 1992. [34] M. O'Rourke, J. Ralston, J. Bell, and S. Lash, "PC-Based Simulation of the F16/MATV," presented at A!AA Modeling and Simulation Technologies Conference, New Orleans, LA, 1997. [35] K. Neville, "Industry Initiative for Revised Training Simulator Validating Process," presented at AIAA Modeling and Simulation Technologies Conference, New Orleans, LA, 1997. [36] J. Rosen, M. MacFarlane, C. Richards, B. Hannaford, and M. Sinanan, "Surgeon-Tool Force/Torque Signatures - Evaluation of Surgical Skills in Minimally Invasive Surgery," presented at Proceedings of the MMVR Conference, 1999. [37] R. O'Toole, R. Playter, T. Krummel, W. Blank, N. Cornelius, W. Roberts, W. Bell, and M. Raibert, "Assessing Skill and Learning in Surgeons and Medical Students Using a Force Feedback Surgical Simulator," presented at MICCAI, Cambridge, MA, 1998. [38] J. Rosser, L. Rosser, and R. Savalgi, "Skill Acquisition Assessment for Laparoscopic Surgery," Arch Surg, vol. 132, pp. 200-204, 1997. [39] L. Moody, C. Barber, and T. Arvanitis, "Objective Surgical Performance Evaluation based on Hptic Feedback," presented at Medicine Meets Virtual Reality, 2002. [40] J. Rosen, C. Richards, B. Hannaford, and M. Sinanan, "Hidden Markov Models of Minimally Invasive Surgery," presented at Medicine Meets Virtual Reality, 2000. [41] M. Srinivasan, "In Virtual Reality: Scientific and Technical Challenges," in Reportf the Committee on Virtual Reality Research and Development, N. Durlach and A. Mayor, Eds.: National Research Council, National Academy Press, 1995. [42] S. Kilbreath and S. Gandevia, "Neural and biomechanical specialization of human thumb muscles revealed by matching weights and grasping objects," Journalof Physiology, vol. 472, pp. 537-556, 1993. [43] L. A. Jones and I. W. Hunter, "Influence of the Machanical Properties of a Manipulandum on Human Operator Dynamics;Part 1. Elastic Stiffness," Biol.Cybern, vol. 62, pp. 299-307, 1990. [44] W. Gardiner and G. Gettingby, Experimental Design Techniques in Statistical Practice: A PracticalSoftware-basedApproach: Horwood Publishing Ltd., 1998.