Reconnection Lagrangian Remap AMR Parallel AMR Further Work Towards a Parallel Coupled Multi-Scale Model of Magnetic Reconnection Michal Charemza Tony Arber May 26, 2007 Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Contents 1 Magnetic Reconnection 2 Lagrangian Remap and Lare2D 3 Adaptive Mesh Refinement Lagrangian Remap and Larma 4 Parallel AMR Lagrangian Remap 5 Further Work Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Contents 1 Magnetic Reconnection 2 Lagrangian Remap and Lare2D 3 Adaptive Mesh Refinement Lagrangian Remap and Larma 4 Parallel AMR Lagrangian Remap 5 Further Work Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Contents 1 Magnetic Reconnection 2 Lagrangian Remap and Lare2D 3 Adaptive Mesh Refinement Lagrangian Remap and Larma 4 Parallel AMR Lagrangian Remap 5 Further Work Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Contents 1 Magnetic Reconnection 2 Lagrangian Remap and Lare2D 3 Adaptive Mesh Refinement Lagrangian Remap and Larma 4 Parallel AMR Lagrangian Remap 5 Further Work Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Contents 1 Magnetic Reconnection 2 Lagrangian Remap and Lare2D 3 Adaptive Mesh Refinement Lagrangian Remap and Larma 4 Parallel AMR Lagrangian Remap 5 Further Work Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Magnetic Reconnection Time Dependent on large scale boundary conditions Affected by small scale non-fluid effects Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Magnetic Reconnection Dependent on large scale boundary conditions Affected by small scale non-fluid effects Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Start of time step At time step n, solution known on Eulerian grid Solution know from from previous step Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work After Lagrangian Step At time step n + 1, solution known on Lagrangian grid. Some numerical time dependent method used. Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work After Remap Step At time step n + 1, solution known on Eulerian grid Geometrical method used Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Lare2D Solves resistive and Hall MHD equations vx , vy , vz Bx vx , vy , vz By ρ, ǫ, Bz vx , vy , vz Michal Charemza By Bx vx , vy , vz Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work What AMR? Adaptive Mesh Refinement Technique for extending a numerical method for solving equations, using different grid resolution is different areas of the domain Higher grid resolution only where, and when, desired Higher resolution typically in regions of high change of variables Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Advantages of AMR? Speed: Faster than equivalent non-AMR code Memory: Less memory used than equivalent non-AMR code. Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Disadvantages of AMR Complex code Computational time used to communicate between refinement levels Computational time used to navigate data structures Can be more difficult to parallelise Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Disadvantages of AMR Complex code Computational time used to communicate between refinement levels Computational time used to navigate data structures Can be more difficult to parallelise Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Disadvantages of AMR Complex code Computational time used to communicate between refinement levels Computational time used to navigate data structures Can be more difficult to parallelise Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Disadvantages of AMR Complex code Computational time used to communicate between refinement levels Computational time used to navigate data structures Can be more difficult to parallelise Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Disadvantages of AMR Complex code Computational time used to communicate between refinement levels Computational time used to navigate data structures Can be more difficult to parallelise Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Typical Lare2D computational domain Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work AMR Larma computational domain Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work AMR Larma computational domain Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work AMR Larma computational domain Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Orszag-Tang Vortex Initial Conditions ρ = 25/9, p = 5/3 vx = − sin y Bx = − sin y vy = sin x By = sin 2x vz = 0 Bz = 0 0 ≤ x, y ≤ 2π, time from 0 to π, periodic boundary conditions Simple intial conditions lead to shocks Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Test Results Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work AMR Patch Placement Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Speedup of Orszag-Tang Vortex + + AMR Non-AMR rs 120000 80000 rs 60000 40000 0 0 +rs rs + rs 20000 +rs Runtime (sec) 100000 500 1000 1500 2000 2500 3000 3500 Resolution (grid cells) Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Memory usage of Orszag-Tang Vortex 8 × 106 6 × 106 + 4 × 106 +rs +rs rs rs 2 × 106 0 0 + rs Number of Cells 107 + + AMR Non-AMR rs 1.200000 × 107 500 1000 1500 2000 2500 3000 3500 Resolution Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Problems Choice of architecture (MPI) Load balancing Communication time What and when to communicate Processing the results Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Problems Choice of architecture (MPI) Load balancing Communication time What and when to communicate Processing the results Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Problems Choice of architecture (MPI) Load balancing Communication time What and when to communicate Processing the results Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Problems Choice of architecture (MPI) Load balancing Communication time What and when to communicate Processing the results Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Problems Choice of architecture (MPI) Load balancing Communication time What and when to communicate Processing the results Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Problems Choice of architecture (MPI) Load balancing Communication time What and when to communicate Processing the results Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Domain decomposition with Ghost Patches Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Domain decomposition with Ghost Patches Node 1 Node 2 Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Domain decomposition with Ghost Patches Node 1 Node 2 Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Ghost Patches: Consequences Majority of inter-patch communication code can be reused Need some way to tell for nodes to tell each other to create (and remove) patches Potentially many messages per time step sent to update ghost patches. High MPI latency per message =⇒ lots of time waiting for messages to complete. Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work AMR Coordinates Node 1 Node 2 Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work AMR Coordinates 8 7 6 Node 1 Node 2 5 4 3 2 1 1 2 3 4 5 Michal Charemza 4 5 6 7 8 Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work AMR Coordinates 3 4 1 2 Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work AMR Coordinates 3 4 23 24 21 22 1 Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work AMR Coordinates 8 7 6 Node 1 Node 2 5 4 Coordinates are (5, 6, 33) 3 2 1 1 2 3 4 5 Michal Charemza 4 5 6 7 8 Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work MPI Struct to combine messages 8 7 6 Node 1 Node 2 5 4 48 Messages Combined into One 3 2 1 1 2 3 4 5 Michal Charemza 4 5 6 7 8 Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Non Blocking Communication 8 7 6 Node 1 Node 2 5 4 Communicate boundary regions while solving internal regions 3 2 1 1 2 3 4 5 Michal Charemza 4 5 6 7 8 Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Current State Implemented: Ghost patches, AMR Coordinates, To do list Implemented: MPI Message combining Non blocking communication - none but patch structure should make this possible Load balancing - none. Different domain sizes possible. Other methods could require more of a re-write. Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Current State Implemented: Ghost patches, AMR Coordinates, To do list Implemented: MPI Message combining Non blocking communication - none but patch structure should make this possible Load balancing - none. Different domain sizes possible. Other methods could require more of a re-write. Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Current State Implemented: Ghost patches, AMR Coordinates, To do list Implemented: MPI Message combining Non blocking communication - none but patch structure should make this possible Load balancing - none. Different domain sizes possible. Other methods could require more of a re-write. Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Current State Implemented: Ghost patches, AMR Coordinates, To do list Implemented: MPI Message combining Non blocking communication - none but patch structure should make this possible Load balancing - none. Different domain sizes possible. Other methods could require more of a re-write. Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Efficiency Metric Efficiency of run on N processors = Runtime on 1 processor N× Runtime on N processors Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Load-imbalance Metric Pattern of communication/processing is same on all processes Time lost due to load imbalance for each computational block = max processors p (compute time for p − average compute time) Use timing calls in code Averages calculated at end of run to minimize communication Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Load-imbalance Metric Pattern of communication/processing is same on all processes Time lost due to load imbalance for each computational block = max processors p (compute time for p − average compute time) Use timing calls in code Averages calculated at end of run to minimize communication Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Load-imbalance Metric Pattern of communication/processing is same on all processes Time lost due to load imbalance for each computational block = max processors p (compute time for p − average compute time) Use timing calls in code Averages calculated at end of run to minimize communication Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Load-imbalance Metric Pattern of communication/processing is same on all processes Time lost due to load imbalance for each computational block = max processors p (compute time for p − average compute time) Use timing calls in code Averages calculated at end of run to minimize communication Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Efficiency Results Efficiency of Parallel AMR running Orszag-Tang problem, using 3 levels of refinement on mhdcluster 1 Timed Results 0.9 If load balanced Efficiency 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0 10 20 30 40 50 60 70 Number of processors Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Communication Time For 64 processor run: runtime = 2 hours lost due to load imbalance = 72 min Using mpiP profiler data sent per node = 82gb messages sent per node = 633k Using SKaMPI benchmarker on mhdcluster: bandwidth ≈ 350mb/s latency ≈ 2.45 µs. =⇒ Bandwith time ≈ 4m, latency time ≈ 1.55s If perfect scaling runtime would be 17 min: Have 25% of runtime unaccounted for. Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Communication Time For 64 processor run: runtime = 2 hours lost due to load imbalance = 72 min Using mpiP profiler data sent per node = 82gb messages sent per node = 633k Using SKaMPI benchmarker on mhdcluster: bandwidth ≈ 350mb/s latency ≈ 2.45 µs. =⇒ Bandwith time ≈ 4m, latency time ≈ 1.55s If perfect scaling runtime would be 17 min: Have 25% of runtime unaccounted for. Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Communication Time For 64 processor run: runtime = 2 hours lost due to load imbalance = 72 min Using mpiP profiler data sent per node = 82gb messages sent per node = 633k Using SKaMPI benchmarker on mhdcluster: bandwidth ≈ 350mb/s latency ≈ 2.45 µs. =⇒ Bandwith time ≈ 4m, latency time ≈ 1.55s If perfect scaling runtime would be 17 min: Have 25% of runtime unaccounted for. Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Communication Time For 64 processor run: runtime = 2 hours lost due to load imbalance = 72 min Using mpiP profiler data sent per node = 82gb messages sent per node = 633k Using SKaMPI benchmarker on mhdcluster: bandwidth ≈ 350mb/s latency ≈ 2.45 µs. =⇒ Bandwith time ≈ 4m, latency time ≈ 1.55s If perfect scaling runtime would be 17 min: Have 25% of runtime unaccounted for. Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Communication Time For 64 processor run: runtime = 2 hours lost due to load imbalance = 72 min Using mpiP profiler data sent per node = 82gb messages sent per node = 633k Using SKaMPI benchmarker on mhdcluster: bandwidth ≈ 350mb/s latency ≈ 2.45 µs. =⇒ Bandwith time ≈ 4m, latency time ≈ 1.55s If perfect scaling runtime would be 17 min: Have 25% of runtime unaccounted for. Michal Charemza Multi-Scale Model of Magnetic Reconnection Reconnection Lagrangian Remap AMR Parallel AMR Further Work Further Work Find missing 25% Load balance Parallel AMR Implemenent non-blocking communication Couple to parallel Vlasov Michal Charemza Multi-Scale Model of Magnetic Reconnection