CSE-633 Parallel Algorithms Stationary Temperature Distribution In A Square Plate Jesper Dybdahl Hede

advertisement
CSE-633 Parallel Algorithms
Spring 2009
Stationary Temperature Distribution In A Square
Plate
Jesper Dybdahl Hede
UB#: 35964041
Problem to Solve
n x n size plate.
n2 fields.
Heaters and coolers
on edges
n
n
100 degrees
-100 degrees
Problem to Solve
Iterative process:
• In iteration i a new temperature value is
calculated for each field on the plate, using the
temperatures from iteration i-1:
x
100
Tx , y (i )
-100
-100
y
100
Tx
1, y
(i 1) Tx
1, y
(i 1) Tx , y 1 (i 1) Tx , y 1 (i 1)
4
Solving the Problem
• Several processors to solve the
problem.
• h horizontal
• v vertical
• h*v total processors.
• Each processors gets an equal part of
the plate, i.e. a smaller rectangle.
• Each processor does all calculations
within its rectangle and communicates
values on edges with neighbouring
processors.
• Processors on edges will use values
from heaters/coolers, i.e. static values.
Algorithm
0 Initial calculations (find neighbours/setup arrays)
1 Loop:
2
Calculate new values for each field rectangle
3
Synchronize with neighbours
4 Gather and save result in text file
Basically, the loop is being parallellized.
Implementation
h-by-v mesh
Communication diameter: Θ(max(h,v))
Bisection width: Θ(min(h,v))
h-by-v in stead of x-by-x → More options for #PEs.
Loop stop condition:
Each processors communicates an INT to neighbors along with edge values.
if(PE updates over threshold) Send 0
else Send min known value +1
min known value is min of own value and received values in last iteration.
Any PE: If min known value > v+h then end loop
-100
106
91
61
76
Series66
Series71
Series76
Series81
Series86
Series91
Series96
Series101
Series106
Series111
Series116
Series61
Series56
Series51
Series46
Series41
Series36
Series31
Series26
Series21
-80
Series16
-60
Series11
-40
Series6
01
16
31
-20
Series1
Result: Temperature in plate
100
80
60
40
20
80-100
60-80
46
40-60
20-40
0-20
-20-0
-40--20
-60--40
-80--60
-100--80
Problem!
Program turns out to be less than perfect for
parallelization.
1,4
1,2
Time (S)
1
0,8
0,6
0,4
0,2
0
4
6
#PEs
9
Problem!
• MPI communication dominates arithmetic calculations
• Not able to make plate as large as wanted (got error above
150x150 fields)
• Solution?
– Do more iterations of calculations between communication.
• Reduces #sync rounds from ~2000 to ~25
2,5
Time (S)
2
#PEs: 4
1,5
1
0,5
0
0
200
400
600
800
1000
#iterations between sync
1200
• 100 iterations per syncronization is the sweet spot!
3,5
3
Time (S)
2,5
1
2
100
500
1,5
1000
1500
1
0,5
0
4
6
#PEs
9
Results: Running time
• Runs with varying # PEs, with 100 iterations per sync. Average over 10
runs.
0,6
Time (S)
0,5
0,4
0,3
0,2
0,1
0
0
20
40
60
80
#PEs
100
120
140
160
Results: Running time
• Runs with varying # PEs, with 100 iterations per sync. Average over 10
runs.
– Minimum at 64 PEs: 0,112559s.
– Little gains > 30 PEs.
– Smaller speedup than expected.
Results: Speedup
• Runs with varying # PEs, with 100 iterations per sync. Average over 10
runs.
6
Speedup
5
4
3
2
1
0
0
20
40
60
80
#PEs
100
120
140
160
Download