Optimisation of Monte Carlo codes for High Performance Computing in Radiotherapy Applications aka The Full Monte! Dr Iwan Cornelius, M.B. Flegg, C.M. Poole, Prof Christian Langton Faculty of Science and Technology Queensland University of Technology Queensland Cancer Physics Collaborative CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Outline • Introduction • Development of a LINAC Monte Carlo model using GEANT4 • Optimisation • Future Directions • Conclusions CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Introduction: Radiotherapy • LINAC: produce highly controllable source of MeV photons – Energy – Gantry angle – Patient position CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Introduction: Radiotherapy • LINAC: produce highly controllable source of MeV photons – Multi Leaf Collimators (MLCs) to define arbitrary shaped fields CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Introduction: Radiotherapy • Planning – Patient imaged – PTV OAR Contoured – Optimisation of fields to conform Dose to tumour and spare healthy tissue • Delivery – Fractionated • Based on analytical calculations – Can be inaccurate in regions of high heterogeneity CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Monte Carlo • What is it? • How is it used in radiotherapy? – Treatment plan verification – Support new dosimetry measurements used in QA • What tools exist? – EGSnrc/BEAMnrc, PENELOPE, MCNPX, GEANT4 • Challenges to overcome – Reduce Computation times (maintain accuracy) • Code optimisation • Variance reduction • High Performance Computing (HPC) – Usability CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney High Performance Computing • Monte Carlo: trivial to parallelise – Launch identical application with unique random number generator seed – Collate results • Centralised Clusters – Multiple machines, Beowulf – Multiple CPU, Shared memory (SGI Altix) • Cons – Look better on paper – Sharing resource with other users – Often limited to # of processors, wait in queue • Single machine, multiple processors – Dual quad core – Hyperthreading can get 16 cores CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney High Performance Computing: GPGPU • General Purpose Graphics Processing Units – – • CUDA – – – – – – • hundreds of processors on a chip NVIDIA Tesla C1060: PCIx 240 cores per card 4GB memory Compute Unified Device Architecture Write ‘kernel’ in ‘C for CUDA’ to run on the GPU Copy from main memory to device memory Kernel executes on GPU Copies result back to main memory Great for loops How to ‘Accelerate’ Monte Carlo codes with GPUs – – – Re-engineer entire code into C for CUDA kernels Re-write computationally intensive portions of code into ‘kernels’ using CUDA Calculation time doesn’t scale with # of processors CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney GEANT4 • Toolkit of C++ classes – Primary beam, geometry, physics processes, scoring – User must create their own application based on these • Very powerful general purpose Monte Carlo tool – High energy physics, space physics, medical physics, optics, radiation protection, astrophysics CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney GEANT4 • Pros – Extremely flexible – Time dependent geometries – Radioactive decay, Neutron transport – Various visualisation tools • Cons – Extremely flexible – Requires proficiency with C++ programming – Steep learning curve – Deterrent for first time users – Hospital based Medical Physicists with limited research time CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney The Full Monte! • Create generic LINAC application using GEANT4 – Capable of modelling Elekta, Varian, Siemens LINACs – Do for GEANT4 what BEAMnrc did for EGSnrc (just text inputs) – Accurate. Verify against experimental data. • Optimise for HPC environments (Desktop Supercomputer) – Distribute over available CPUs – Port to the GPU • User interface – Simple text-file based interface – Graphical User Interface • Interface with TPS – Able to routinely verify treatment plans CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Geometry • Varian 2100 Clinac – Dimensions, material composition from Varian Docs • Target • Primary Collimator • Vacuum window CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Geometry • Flattening filter – Compensate for forward peaked distribution of bremsstrahlung photons • Ionisation chamber – Monitor total Dose delivery CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Geometry • Jaws – Define square fields CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Geometry • Multi-Leaf Collimators (MLCs) – Interleaved Tungsten leaves – Varian Millenium – Brad Oborn (UoW) CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Primary Beam • Monoenergetic electron beam • Normally incident on target • Gaussian spread radially CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Physics • Photons – Photoelectric effect – Compton – GammaConversion • Electrons – Multiple scatter – Ionisation – Bremmstrahlung • Positrons – Ditto – Annihilation CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Scoring • Water Phantom – 50 cm x 50 cm x 50 cm – Score in voxelised geometry CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Validation / Commissioning • Comparison with ionisation chamber measurements in a water phantom – Scanning with x,y,z • Dose along beam axis CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Validation: Tune Electron Beam Energy • Tuning of electron beam energy for best match – 10 cm x 10 cm field – Compare between – 10-30cm depths CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Results: Tune Electron Beam Energy • Comparison with ionisation chamber measurements in water • Tuning of electron beam energy for best match – 10 cm x 10 cm field – Compare between – 10-30cm depths CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Results: 5.85 MeV, 10 cm x 10 cm • Within 2% agreement between 0.5cm and 38cm CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Results: 5.85 MeV, 10 cm x 10 cm • Within 2% agreement between 0.5cm and 38cm CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Results: 5.85 MeV, 5 cm x 5 cm CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Results: 5.85 MeV, 20 cm x 20 cm CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Results: 5.85 MeV, 40 cm x 40 cm CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Optimisations • No Optimisation – Many photons produced will never reach the sensitive region of the geometry CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Optimisations • Kill zones – Nothing fancy-pants – Terminate histories that are unlikely to contribute to observable – Above target – Around primary collimator • Relative Computation Time: 78 % CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Optimisations • Phase space files – Some aspects of geometry don’t change – Create pre-calculated radiation field at plane – Sample this population to conserve computation times • Relative Computation Time: 38 % • 380 hrs, O(1010) CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney HPC: GPU/CPU Desktop Supercomputer • Purchase of Xenon T5 Desktop Supercomputer – “The Terminator” – 4 x C1060 Tesla card = 960 cores! – 2 x quad core processors • hyper-threading • Linux ‘sees’ 16 processors • NVIDIA Professorial partnership grant – Awarded 3 x C1060 Tesla cards • Research team learning CUDA – Mark Harris, local CUDA guru CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Optimisations: Parallelise on CPUs • Message Passing Interface (MPI) – Run identical simulation on different core with unique random number – Geant4 MPImanager class – Time scales roughly linearly with number of processors – Simulations in 24 hrs, O(1010) CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney The GPU Dilemma • 1. Re-write entire code into C for CUDA? – C for CUDA doesn’t support sophisticated data types (classes) – O(10^6) lines of code, dozens of developers – Wait for CUDA to catch up (?) • 2. Create C++ wrapper classes for certain methods – First step, random number generator – Incorporated into GEANT4 framework via inheritance – Implementing Mersenne Twister algorithm (hack example from CUDA SDK) to generate cache of random numbers – Improvement of only a few percent CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Profiling! • • • Great first step when optimising code Linux gprof require to re-compile with flags set MacOSX – Profiling tool doesn’t require recompile CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Conclusions • GEANT4 LINAC application has been developed – – – – Specific to Varian Clinac Many parameters hard-coded Work commenced on textfile based UI commands Preliminary validation promising • Optimisation – – – – Phase space files Kill zones MPI for parallel processing on CPUs Porting random number generator to GPU CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Future Directions • Validation – Verify dose distributions in heterogeneous phantoms – Verify model of MLCs (irregular fields) – Develop interface to Treatment Planning System • Optimisation – Re-write part of GEANT4 to run on GPU • Interface – User friendly text-file based commands • Treatment Plan interface – Implement DICOM-RT interface CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney Acknowledgements • QUT – Scott Crowe, Tanya Kairn, Andrew Fielding • discussion on Varian LINAC model, Experimental data – Mark Barry, Mark O Dwyer • discussion on CPU optimisation, High Performance Computing • Mater Hospital, Brisbane – Radiation Oncology Group • UoW – Brad Oborn • Millenium MLC model • GEANT4 Collaboration – Joseph Perl (SLAC) • discussion on visualisation / profiling • NVIDIA – Mark Harris CRICOS No. 00213J Queensland University of Technology AstroMed09, 14-16th December, The University of Sydney