Towards a Parallel Coupled Multi-Scale Model of Magnetic Reconnection Michal Charemza Tony Arber

advertisement
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Towards a Parallel Coupled Multi-Scale Model of
Magnetic Reconnection
Michal Charemza
Tony Arber
May 26, 2007
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Contents
1
Magnetic Reconnection
2
Lagrangian Remap and Lare2D
3
Adaptive Mesh Refinement Lagrangian Remap and Larma
4
Parallel AMR Lagrangian Remap
5
Further Work
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Contents
1
Magnetic Reconnection
2
Lagrangian Remap and Lare2D
3
Adaptive Mesh Refinement Lagrangian Remap and Larma
4
Parallel AMR Lagrangian Remap
5
Further Work
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Contents
1
Magnetic Reconnection
2
Lagrangian Remap and Lare2D
3
Adaptive Mesh Refinement Lagrangian Remap and Larma
4
Parallel AMR Lagrangian Remap
5
Further Work
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Contents
1
Magnetic Reconnection
2
Lagrangian Remap and Lare2D
3
Adaptive Mesh Refinement Lagrangian Remap and Larma
4
Parallel AMR Lagrangian Remap
5
Further Work
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Contents
1
Magnetic Reconnection
2
Lagrangian Remap and Lare2D
3
Adaptive Mesh Refinement Lagrangian Remap and Larma
4
Parallel AMR Lagrangian Remap
5
Further Work
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Magnetic Reconnection
Time
Dependent on large scale boundary conditions
Affected by small scale non-fluid effects
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Magnetic Reconnection
Dependent on large scale boundary conditions
Affected by small scale non-fluid effects
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Start of time step
At time step n, solution known on Eulerian grid
Solution know from from previous step
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
After Lagrangian Step
At time step n + 1, solution known on Lagrangian grid.
Some numerical time dependent method used.
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
After Remap Step
At time step n + 1, solution known on Eulerian grid
Geometrical method used
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Lare2D
Solves resistive and Hall MHD equations
vx , vy , vz
Bx
vx , vy , vz
By
ρ, ǫ, Bz
vx , vy , vz
Michal Charemza
By
Bx
vx , vy , vz
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
What AMR?
Adaptive Mesh Refinement
Technique for extending a numerical method for solving
equations, using different grid resolution is different areas of
the domain
Higher grid resolution only where, and when, desired
Higher resolution typically in regions of high change of
variables
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Advantages of AMR?
Speed: Faster than equivalent non-AMR code
Memory: Less memory used than equivalent non-AMR code.
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Disadvantages of AMR
Complex code
Computational time used to communicate between refinement
levels
Computational time used to navigate data structures
Can be more difficult to parallelise
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Disadvantages of AMR
Complex code
Computational time used to communicate between refinement
levels
Computational time used to navigate data structures
Can be more difficult to parallelise
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Disadvantages of AMR
Complex code
Computational time used to communicate between refinement
levels
Computational time used to navigate data structures
Can be more difficult to parallelise
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Disadvantages of AMR
Complex code
Computational time used to communicate between refinement
levels
Computational time used to navigate data structures
Can be more difficult to parallelise
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Disadvantages of AMR
Complex code
Computational time used to communicate between refinement
levels
Computational time used to navigate data structures
Can be more difficult to parallelise
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Typical Lare2D computational domain
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
AMR Larma computational domain
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
AMR Larma computational domain
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
AMR Larma computational domain
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Orszag-Tang Vortex
Initial Conditions ρ = 25/9, p = 5/3
vx = − sin y
Bx = − sin y
vy = sin x
By = sin 2x
vz = 0
Bz = 0
0 ≤ x, y ≤ 2π, time from 0 to π, periodic boundary conditions
Simple intial conditions lead to shocks
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Test Results
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
AMR Patch Placement
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Speedup of Orszag-Tang Vortex
+
+
AMR
Non-AMR
rs
120000
80000
rs
60000
40000
0
0
+rs
rs +
rs
20000
+rs
Runtime (sec)
100000
500
1000
1500
2000
2500
3000
3500
Resolution (grid cells)
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Memory usage of Orszag-Tang Vortex
8 × 106
6 × 106
+
4 × 106
+rs
+rs
rs
rs
2 × 106
0
0
+ rs
Number of Cells
107
+ +
AMR
Non-AMR
rs
1.200000 × 107
500
1000
1500
2000
2500
3000
3500
Resolution
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Problems
Choice of architecture (MPI)
Load balancing
Communication time
What and when to communicate
Processing the results
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Problems
Choice of architecture (MPI)
Load balancing
Communication time
What and when to communicate
Processing the results
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Problems
Choice of architecture (MPI)
Load balancing
Communication time
What and when to communicate
Processing the results
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Problems
Choice of architecture (MPI)
Load balancing
Communication time
What and when to communicate
Processing the results
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Problems
Choice of architecture (MPI)
Load balancing
Communication time
What and when to communicate
Processing the results
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Problems
Choice of architecture (MPI)
Load balancing
Communication time
What and when to communicate
Processing the results
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Domain decomposition with Ghost Patches
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Domain decomposition with Ghost Patches
Node 1
Node 2
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Domain decomposition with Ghost Patches
Node 1
Node 2
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Ghost Patches: Consequences
Majority of inter-patch communication code can be reused
Need some way to tell for nodes to tell each other to create
(and remove) patches
Potentially many messages per time step sent to update ghost
patches. High MPI latency per message =⇒ lots of time
waiting for messages to complete.
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
AMR Coordinates
Node 1
Node 2
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
AMR Coordinates
8
7
6
Node 1
Node 2
5
4
3
2
1
1
2
3
4
5
Michal Charemza
4
5
6
7
8
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
AMR Coordinates
3
4
1
2
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
AMR Coordinates
3
4
23
24
21
22
1
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
AMR Coordinates
8
7
6
Node 1
Node 2
5
4
Coordinates are (5, 6, 33)
3
2
1
1
2
3
4
5
Michal Charemza
4
5
6
7
8
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
MPI Struct to combine messages
8
7
6
Node 1
Node 2
5
4
48 Messages Combined into One
3
2
1
1
2
3
4
5
Michal Charemza
4
5
6
7
8
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Non Blocking Communication
8
7
6
Node 1
Node 2
5
4
Communicate boundary regions while solving internal regions
3
2
1
1
2
3
4
5
Michal Charemza
4
5
6
7
8
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Current State
Implemented: Ghost patches, AMR Coordinates, To do list
Implemented: MPI Message combining
Non blocking communication - none but patch structure
should make this possible
Load balancing - none. Different domain sizes possible. Other
methods could require more of a re-write.
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Current State
Implemented: Ghost patches, AMR Coordinates, To do list
Implemented: MPI Message combining
Non blocking communication - none but patch structure
should make this possible
Load balancing - none. Different domain sizes possible. Other
methods could require more of a re-write.
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Current State
Implemented: Ghost patches, AMR Coordinates, To do list
Implemented: MPI Message combining
Non blocking communication - none but patch structure
should make this possible
Load balancing - none. Different domain sizes possible. Other
methods could require more of a re-write.
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Current State
Implemented: Ghost patches, AMR Coordinates, To do list
Implemented: MPI Message combining
Non blocking communication - none but patch structure
should make this possible
Load balancing - none. Different domain sizes possible. Other
methods could require more of a re-write.
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Efficiency Metric
Efficiency of run on N processors
=
Runtime on 1 processor
N× Runtime on N processors
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Load-imbalance Metric
Pattern of communication/processing is same on all processes
Time lost due to load imbalance for each computational block
=
max
processors p
(compute time for p − average compute time)
Use timing calls in code
Averages calculated at end of run to minimize communication
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Load-imbalance Metric
Pattern of communication/processing is same on all processes
Time lost due to load imbalance for each computational block
=
max
processors p
(compute time for p − average compute time)
Use timing calls in code
Averages calculated at end of run to minimize communication
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Load-imbalance Metric
Pattern of communication/processing is same on all processes
Time lost due to load imbalance for each computational block
=
max
processors p
(compute time for p − average compute time)
Use timing calls in code
Averages calculated at end of run to minimize communication
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Load-imbalance Metric
Pattern of communication/processing is same on all processes
Time lost due to load imbalance for each computational block
=
max
processors p
(compute time for p − average compute time)
Use timing calls in code
Averages calculated at end of run to minimize communication
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Efficiency Results
Efficiency of Parallel AMR running Orszag-Tang problem, using 3 levels of
refinement on mhdcluster
1
Timed Results
0.9
If load balanced
Efficiency
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0
10
20
30
40
50
60
70
Number of processors
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Communication Time
For 64 processor run:
runtime = 2 hours
lost due to load imbalance = 72 min
Using mpiP profiler
data sent per node = 82gb
messages sent per node = 633k
Using SKaMPI benchmarker on mhdcluster:
bandwidth ≈ 350mb/s
latency ≈ 2.45 µs.
=⇒ Bandwith time ≈ 4m, latency time ≈ 1.55s
If perfect scaling runtime would be 17 min: Have 25% of
runtime unaccounted for.
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Communication Time
For 64 processor run:
runtime = 2 hours
lost due to load imbalance = 72 min
Using mpiP profiler
data sent per node = 82gb
messages sent per node = 633k
Using SKaMPI benchmarker on mhdcluster:
bandwidth ≈ 350mb/s
latency ≈ 2.45 µs.
=⇒ Bandwith time ≈ 4m, latency time ≈ 1.55s
If perfect scaling runtime would be 17 min: Have 25% of
runtime unaccounted for.
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Communication Time
For 64 processor run:
runtime = 2 hours
lost due to load imbalance = 72 min
Using mpiP profiler
data sent per node = 82gb
messages sent per node = 633k
Using SKaMPI benchmarker on mhdcluster:
bandwidth ≈ 350mb/s
latency ≈ 2.45 µs.
=⇒ Bandwith time ≈ 4m, latency time ≈ 1.55s
If perfect scaling runtime would be 17 min: Have 25% of
runtime unaccounted for.
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Communication Time
For 64 processor run:
runtime = 2 hours
lost due to load imbalance = 72 min
Using mpiP profiler
data sent per node = 82gb
messages sent per node = 633k
Using SKaMPI benchmarker on mhdcluster:
bandwidth ≈ 350mb/s
latency ≈ 2.45 µs.
=⇒ Bandwith time ≈ 4m, latency time ≈ 1.55s
If perfect scaling runtime would be 17 min: Have 25% of
runtime unaccounted for.
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Communication Time
For 64 processor run:
runtime = 2 hours
lost due to load imbalance = 72 min
Using mpiP profiler
data sent per node = 82gb
messages sent per node = 633k
Using SKaMPI benchmarker on mhdcluster:
bandwidth ≈ 350mb/s
latency ≈ 2.45 µs.
=⇒ Bandwith time ≈ 4m, latency time ≈ 1.55s
If perfect scaling runtime would be 17 min: Have 25% of
runtime unaccounted for.
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Reconnection
Lagrangian Remap
AMR
Parallel AMR
Further Work
Further Work
Find missing 25%
Load balance Parallel AMR
Implemenent non-blocking communication
Couple to parallel Vlasov
Michal Charemza
Multi-Scale Model of Magnetic Reconnection
Download