Black Box Clustering and Parallel H-LU

advertisement
Black Box Clustering and Parallel
H-LU Factorisation
Ronald Kriemann
Max Planck Institute for Mathematics
in the
Sciences Leipzig
Winterschool on H-Matrices
2009
Black Box Clustering and Parallel H-LU Factorisation
H
Lib
pro
1/34
Overview
1 Motivation
2 Graph Partitioning
3 Admissibility
4 Nested Dissection
5 Parallelisation
Black Box Clustering and Parallel H-LU Factorisation
2/34
Motivation
Black Box Clustering and Parallel H-LU Factorisation
3/34
Motivation
Model Problem
Consider
−∆u = 0
in Ω = [0, 1]2
Using a uniform grid width stepwidth h
and standard piecewiese linear finite elements with nodal points
xi , i ∈ I, one obtains the stiffness matrix A as
Black Box Clustering and Parallel H-LU Factorisation
4/34
Motivation
Matrixgraph
Define the matrix graph G(A) = (VA , EA ) of A ∈
RI×I as
EA := I,
VA := {(i, j) ∈ I × I : i 6= j ∧ aij 6= 0},
i.e. edges in the graph are defined by the sparsity pattern of the
stiffness matrix.
Remark
Non-zero entries aij only exist in A if i and j are
neighboured.
For the model problem the matrix graph looks as
Black Box Clustering and Parallel H-LU Factorisation
5/34
Motivation
Matrixgraph
Define distance dG (i, j) between nodes i, j ∈ I as length of
shortest path in G(A). Then, for i, j ∈ I we have:
kxi − xj k2 ≤ dG (i, j)h,
i.e. distance in
R2 is mapped to distance in G(A).
k
kxi − xj k2 =
kxi − xk k2 =
i
√
√
j
13h,
dG (i, j) = 5
5h,
dG (i, k) = 3
Black Box Clustering and Parallel H-LU Factorisation
6/34
Motivation
Clustering via Graph Distance
Since nodes in G(A) with small distance are geometrically
neighboured, one can use graph distance to cluster indices.
I1
I
I0
Recursively partition sub graphs for cluster tree construction.
Black Box Clustering and Parallel H-LU Factorisation
7/34
Graph Partitioning
Black Box Clustering and Parallel H-LU Factorisation
8/34
Graph Partitioning
Requirements
R
Let A ∈ I×I be a sparse matrix and G = G(A) = (VA , EA ) the
corresponding matrix graph. Furthermore, let
diam(G) := max dG (i, j)
i,j∈VA
diamG (V ) := max dG (i, j),
i,j∈V
V ⊆ VA
denote the diameter of the graph and of a sub graph, respectively.
For cluster tree construction, one needs a graph partitioning
algorithm with the following properties:
• compact sub graphs (small diameter),
• small edge-cut (small number of edges connecting sub
graphs).
Remark
No edges between sub graphs corresponds to decoupled
clusters and therefore to a block diagonal matrix.
Black Box Clustering and Parallel H-LU Factorisation
9/34
Graph Partitioning
Partitioning via Breadth First Search
Algorithm:
1 determine two nodes i, j ∈ VA with (almost) maximal
distance,
Black Box Clustering and Parallel H-LU Factorisation
10/34
Graph Partitioning
Partitioning via Breadth First Search
Algorithm:
1 determine two nodes i, j ∈ VA with (almost) maximal
distance,
2 perform simultaneous BFS from i and j to construct sub
clusters:
• per step, add unvisited neighbours of nodes in sub clusters
Black Box Clustering and Parallel H-LU Factorisation
10/34
Graph Partitioning
Partitioning via Breadth First Search
Algorithm:
1 determine two nodes i, j ∈ VA with (almost) maximal
distance,
2 perform simultaneous BFS from i and j to construct sub
clusters:
• per step, add unvisited neighbours of nodes in sub clusters
Black Box Clustering and Parallel H-LU Factorisation
10/34
Graph Partitioning
Partitioning via Breadth First Search
Algorithm:
1 determine two nodes i, j ∈ VA with (almost) maximal
distance,
2 perform simultaneous BFS from i and j to construct sub
clusters:
• per step, add unvisited neighbours of nodes in sub clusters
Black Box Clustering and Parallel H-LU Factorisation
10/34
Graph Partitioning
Partitioning via Breadth First Search
Algorithm:
1 determine two nodes i, j ∈ VA with (almost) maximal
distance,
2 perform simultaneous BFS from i and j to construct sub
clusters:
• per step, add unvisited neighbours of nodes in sub clusters
Black Box Clustering and Parallel H-LU Factorisation
10/34
Graph Partitioning
Partitioning via Breadth First Search
Algorithm:
1 determine two nodes i, j ∈ VA with (almost) maximal
distance,
2 perform simultaneous BFS from i and j to construct sub
clusters:
• per step, add unvisited neighbours of nodes in sub clusters
Black Box Clustering and Parallel H-LU Factorisation
10/34
Graph Partitioning
Partitioning via Breadth First Search
Algorithm:
1 determine two nodes i, j ∈ VA with (almost) maximal
distance,
2 perform simultaneous BFS from i and j to construct sub
clusters:
• per step, add unvisited neighbours of nodes in sub clusters
3
recurse in sub graphs
Black Box Clustering and Parallel H-LU Factorisation
10/34
Graph Partitioning
General Graph Partitioning for Clustering
BFS based graph partitioning yields compact sub graphs, but not
neccessarily minimal edge-cut, but can be improved using
“Fiduccia-Mattheyses-Algorithm” (see Literature).
#edge-cut:
8
Black Box Clustering and Parallel H-LU Factorisation
6
11/34
Graph Partitioning
General Graph Partitioning for Clustering
In graph theory, the graph partitioning problem is defined as:
Given a graph G = (V, E) a partitioning P = {V1 , V2 },
with V1 ∩ V2 = ∅ and V = V1 ∪ V2 , of V is sought, such
that
#V1 ∼ #V2
and
IG (V1 , V2 ) := #{(i, j) ∈ E : i ∈ V1 ∧ j ∈ V2 } = min
Unfortunately, the graph partitioning problem is NP-hard. But
good approximation algorithm exist and are implemented in open
source software libraries, e.g.:
• METIS, Scotch (multi-level graph partitioning),
• CHACO (multi-level and spectral graph partitioning).
Black Box Clustering and Parallel H-LU Factorisation
12/34
Graph Partitioning
General Graph Partitioning for Clustering
General black box clustering algorithm:
function blackbox cluster( G = (V, E) )
if #V ≤ nmin then
return cluster t := V ;
else
{G1 , G2 } = partition( G );
t1 := blackbox cluster( G1 );
t2 := blackbox cluster( G2 );
return cluster t := V with S(t) := {t1 , t2 };
end if
end
Here, partition implements the general graph partitioning
algorithm, e.g. from METIS etc..
Black Box Clustering and Parallel H-LU Factorisation
13/34
Graph Partitioning
13
7
General Graph Partitioning for Clustering
14
0
15
6
13
8
92
93
116
29
95
13
5
13
9
15
4
137
12
5
15
8
30
15
5
12
6
13
4
14
3
33
12
2
13
6
15
2
158
26
14
2
5
12
4
134
14
1
157
13
2
12
3
14
7
15
9
28
5
13
1
12
8
31
13
3
132
12
7
27
14
6
3
18
72
74
12
9
73
83
56
76
78
81
43
1
7
41
46
9
49
77
49
9
45
44
48
46
61
57
78
76
42
50
39
8
54
1
10
47
52
48
51
52
57
40
53
55
10
44
104
11
62
61
77
16
75
17
79
58
60
73
16
144
129
47
60
62
43
106
59
12
81
17
82
127
50
7
105
88
79
74
128
72
110
108
2
70
3
146
109
23
86
83
11
10
4
84
87
65
80
13
0
14
4
70
15
19
145
148
133
111
119
99
15
28
24
107
63
69
131
124
113
121
85
13
64
71
31
120
4
103
14
150
26
14
8
14
5
69
71
159
147
75
88
58
18
100
67
112
117
101
149
68
122
22
66
32
161
143
142
25
91
151
125
136
98
0
160
30
115
89
102
153
139
27
80
96
33
141
63
65
152
135
130
2
12
86
59
84
10
8
10
6
118
21
140
126
15
0
14
98
66
68
10
0
13
64
10
1
85
19
87
10
7
23
10
5
114
90
97
123
10
3
99
29
155
15
7
16
0
15
3
32
15
1
67
0
14
9
94
97
96
21
10
2
22
91
4
94
154
16
1
92
93
95
11
6
20
90
11
4
11
8
89
11
5
25
11
7
11
2
12
0
12
1
11
3
24
11
9
11
1
10
9
11
0
20
156
138
35
6
51
36
38
37
34
34
36
38
35
6
40
39
53
42
55
8
41
56
45
54
82
37
BFS (#IG = 21)
BFS+FM (#IG = 11)
92
92
4
11
114
20
8
11
95
5
11
20
115
94
89
21
97
91
2
10
96
4
15
98
6
15
22
53
0
14
9
13
30
5
12
3
14
2
14
6
12
2
13
26
1
13
1
14
2
12
4
12
28
3
13
3
12
27
8
12
7
12
50
55
52
10
50
35
51
55
52
36
6
37
49
6
75
35
53
129
49
40
54
47
130
5
13
4
13
6
13
5
7
14
31
8
14
6
14
4
14
75
47
40
9
127
54
8
48
76
128
10
16
123
9
12
9
0
13
8
48
39
44
46
76
16
46
39
78
144
29
2
15
1
16
9
15
70
5
14
3
74
78
45
1
73
72
73
1
42
45
72
7
15
14
71
0
15
69
79
80
81
61
17
42
43
77
146
44
148
77
43
7
17
131
133
33
32
64
65
15
59
12
7
132
8
15
3
15
1
15
67
13
68
63
2
62
41
41
61
8
13
5
15
0
10
0
66
99
87
88
60
57
62
81
74
28
27
7
13
1
10
3
10
19
86
57
56
12
80
141
7
11
11
60
70
26
4
58
79
69
145
3
124
84
11
150
126
7
10
85
82
59
15
159
122
18
58
71
161
147
82
65
14
157
5
31
25
3
11
83
149
134
136
142
23
2
83
32
135
30
143
104
4
10
88
64
56
140
5
10
18
63
68
160
139
8
10
106
6
10
86
87
67
33
125
105
108
99
13
151
9
11
23
84
19
66
152
153
1
12
9
10
85
103
100
0
155
110
0
11
107
22
98
154
109
119
101
137
158
0
12
1
11
24
121
4
96
156
111
24
91
102
29
113
120
117
97
138
90
2
11
112
25
89
0
16
118
90
21
9
14
93
95
94
93
6
11
116
51
34
38
38
36
37
34
METIS (#IG = 12)
Black Box Clustering and Parallel H-LU Factorisation
Scotch (#IG = 12)
14/34
Admissibility
Black Box Clustering and Parallel H-LU Factorisation
15/34
Admissibility
Prerequisites
Standard admissibility is defined by
min(diam(Ωt ), diam(Ωs )) ≤ η dist(Ωt , Ωs )
with support Ωi for each cluster i and, hence, uses unavailable
geometrical data.
Distance in Graphs
For V1 , V2 ⊂ V , the distance between V1 and V2 is defined as
distG (V1 , V2 ) :=
min
i∈V1 ,j∈V2
distG (i, j)
with
dist(i, j) := length of shortest path between i and j in G.
Black Box Clustering and Parallel H-LU Factorisation
16/34
Admissibility
Weak Admissibility
The simplest admissibility condition for a block cluster (t, s) is
defined by
(
true,
if distG (t, s) > 1
admweak (t, s) :=
,
false, otherwise
e.g. if no edge is connecting t and s in G.
t3
t2
t1
admweak (t1 , t2 ) = true
admweak (t1 , t3 ) = false
Weak admissibility is cheap to test and produces effective
partitions for H-arithmetics (see experiments).
Black Box Clustering and Parallel H-LU Factorisation
17/34
Admissibility
Standard Admissibility
The standard admissibility is defined by
(
true, min(diamG (t), diamG (s)) ≤ η distG (t, s)
admstd (t, s) :=
,
false, otherwise
e.g. the equivalent of the geometrical admissibility.
Since diameter and distance between clusters in G costs O n2 ,
the admissibility is tested as:
• choose node i ∈ t and j ∈ t with
distG (i, j) = max,
t3
t2
]
• diamG (t) ≤ 2 distG (i, j) =: diam,
• construct surrounding t0 around t
]
in G via η1 diam.
t1
• if t0 ∩ s = ∅, admstd (t, s) = true.
Black Box Clustering and Parallel H-LU Factorisation
18/34
Admissibility
Standard Admissibility
The standard admissibility is defined by
(
true, min(diamG (t), diamG (s)) ≤ η distG (t, s)
admstd (t, s) :=
,
false, otherwise
e.g. the equivalent of the geometrical admissibility.
Since diameter and distance between clusters in G costs O n2 ,
the admissibility is tested as:
• choose node i ∈ t and j ∈ t with
distG (i, j) = max,
t3
t2
]
• diamG (t) ≤ 2 distG (i, j) =: diam,
• construct surrounding t0 around t
]
in G via η1 diam.
t1
• if t0 ∩ s = ∅, admstd (t, s) = true.
Black Box Clustering and Parallel H-LU Factorisation
18/34
Admissibility
Standard Admissibility
The standard admissibility is defined by
(
true, min(diamG (t), diamG (s)) ≤ η distG (t, s)
admstd (t, s) :=
,
false, otherwise
e.g. the equivalent of the geometrical admissibility.
Since diameter and distance between clusters in G costs O n2 ,
the admissibility is tested as:
• choose node i ∈ t and j ∈ t with
distG (i, j) = max,
t3
t2
]
• diamG (t) ≤ 2 distG (i, j) =: diam,
• construct surrounding t0 around t
]
in G via η1 diam.
t1
• if t0 ∩ s = ∅, admstd (t, s) = true.
Black Box Clustering and Parallel H-LU Factorisation
18/34
Admissibility
Standard Admissibility
The standard admissibility is defined by
(
true, min(diamG (t), diamG (s)) ≤ η distG (t, s)
admstd (t, s) :=
,
false, otherwise
e.g. the equivalent of the geometrical admissibility.
Since diameter and distance between clusters in G costs O n2 ,
the admissibility is tested as:
• choose node i ∈ t and j ∈ t with
distG (i, j) = max,
t3
t2
]
• diamG (t) ≤ 2 distG (i, j) =: diam,
• construct surrounding t0 around t
]
in G via η1 diam.
t1
• if t0 ∩ s = ∅, admstd (t, s) = true.
Black Box Clustering and Parallel H-LU Factorisation
18/34
Admissibility
Numerical Examples
H-LU factorisation of Model Problem:
N
Geometric
Black Box
Time Mem
δ
Time Mem
δ
2532
3582
5112
7292
10232
403
513
643
813
1023
(sec)
(MB)
3.8
10.0
24.1
61.1
144.9
79.1
194.5
520.3
1440.0
3875.5
76
169
374
840
1780
285
634
1400
3560
8070
210 -4
110 -4
710 -5
410 -5
210 -5
110 -3
110 -3
110 -3
510 -4
410 -4
(sec)
(MB)
6.6
15.7
41.7
116.1
250.8
106.5
326.1
896.4
2444.8
6575.7
86
187
441
1020
2110
292
763
1760
4330
9940
110 -4
610 -5
310 -5
110 -5
810 -6
110 -3
710 -4
410 -4
210 -4
210 -4
Accuracy of H-arithmetics defined by δ and chosen such that
kI − (LH UH )−1 Ak2 ≤ 10−4
Black Box Clustering and Parallel H-LU Factorisation
19/34
Nested Dissection
Black Box Clustering and Parallel H-LU Factorisation
20/34
Nested Dissection
Vertex Separator
In nested dissection the two constructed sub graphs of a partition
have to be separated via a vertex separator.
Matrix graph:
Matrix:
Especially suited are graph partitioning algorithms yielding minimal
edge-cut, therefore, maximizing the size of the zero off-diagonal
matrix blocks.
Black Box Clustering and Parallel H-LU Factorisation
21/34
Nested Dissection
Constructing the Vertex Separator
Let V1 , V2 ⊂ V, V1 ∩ V2 = ∅ be a partition of G = (V, E) and let
E = {(i, j) ∈ E : i ∈ V1 , j ∈ V2 } be the edge-cut of V1 , V2 .
A vertex separator V for V1 , V2 can be obtained by computing a
vertex cover of E, i.e. a set of nodes incident to all edges in E.
Algorithm:
Loop until E 6= ∅:
Black Box Clustering and Parallel H-LU Factorisation
22/34
Nested Dissection
Constructing the Vertex Separator
Let V1 , V2 ⊂ V, V1 ∩ V2 = ∅ be a partition of G = (V, E) and let
E = {(i, j) ∈ E : i ∈ V1 , j ∈ V2 } be the edge-cut of V1 , V2 .
A vertex separator V for V1 , V2 can be obtained by computing a
vertex cover of E, i.e. a set of nodes incident to all edges in E.
Algorithm:
Loop until E 6= ∅:
• choose (i, j) ∈ E;
Black Box Clustering and Parallel H-LU Factorisation
22/34
Nested Dissection
Constructing the Vertex Separator
Let V1 , V2 ⊂ V, V1 ∩ V2 = ∅ be a partition of G = (V, E) and let
E = {(i, j) ∈ E : i ∈ V1 , j ∈ V2 } be the edge-cut of V1 , V2 .
A vertex separator V for V1 , V2 can be obtained by computing a
vertex cover of E, i.e. a set of nodes incident to all edges in E.
Algorithm:
Loop until E 6= ∅:
• choose (i, j) ∈ E;
• choose v ∈ {i, j} such that v ∈ V 0
with #V 0 = max Vi ;
Black Box Clustering and Parallel H-LU Factorisation
22/34
Nested Dissection
Constructing the Vertex Separator
Let V1 , V2 ⊂ V, V1 ∩ V2 = ∅ be a partition of G = (V, E) and let
E = {(i, j) ∈ E : i ∈ V1 , j ∈ V2 } be the edge-cut of V1 , V2 .
A vertex separator V for V1 , V2 can be obtained by computing a
vertex cover of E, i.e. a set of nodes incident to all edges in E.
Algorithm:
Loop until E 6= ∅:
• choose (i, j) ∈ E;
• choose v ∈ {i, j} such that v ∈ V 0
with #V 0 = max Vi ;
• V := V ∪ {v}; V 0 := V 0 \ {v};
• E := E \ {(i, j 0 ) ∈ E};
Black Box Clustering and Parallel H-LU Factorisation
22/34
Nested Dissection
Constructing the Vertex Separator
Let V1 , V2 ⊂ V, V1 ∩ V2 = ∅ be a partition of G = (V, E) and let
E = {(i, j) ∈ E : i ∈ V1 , j ∈ V2 } be the edge-cut of V1 , V2 .
A vertex separator V for V1 , V2 can be obtained by computing a
vertex cover of E, i.e. a set of nodes incident to all edges in E.
Algorithm:
Loop until E 6= ∅:
• choose (i, j) ∈ E;
• choose v ∈ {i, j} such that v ∈ V 0
with #V 0 = max Vi ;
• V := V ∪ {v}; V 0 := V 0 \ {v};
• E := E \ {(i, j 0 ) ∈ E};
Black Box Clustering and Parallel H-LU Factorisation
22/34
Nested Dissection
Constructing the Vertex Separator
Let V1 , V2 ⊂ V, V1 ∩ V2 = ∅ be a partition of G = (V, E) and let
E = {(i, j) ∈ E : i ∈ V1 , j ∈ V2 } be the edge-cut of V1 , V2 .
A vertex separator V for V1 , V2 can be obtained by computing a
vertex cover of E, i.e. a set of nodes incident to all edges in E.
Algorithm:
Loop until E 6= ∅:
• choose (i, j) ∈ E;
• choose v ∈ {i, j} such that v ∈ V 0
with #V 0 = max Vi ;
• V := V ∪ {v}; V 0 := V 0 \ {v};
• E := E \ {(i, j 0 ) ∈ E};
Black Box Clustering and Parallel H-LU Factorisation
22/34
Nested Dissection
Constructing the Vertex Separator
Let V1 , V2 ⊂ V, V1 ∩ V2 = ∅ be a partition of G = (V, E) and let
E = {(i, j) ∈ E : i ∈ V1 , j ∈ V2 } be the edge-cut of V1 , V2 .
A vertex separator V for V1 , V2 can be obtained by computing a
vertex cover of E, i.e. a set of nodes incident to all edges in E.
Algorithm:
Loop until E 6= ∅:
• choose (i, j) ∈ E;
• choose v ∈ {i, j} such that v ∈ V 0
with #V 0 = max Vi ;
• V := V ∪ {v}; V 0 := V 0 \ {v};
• E := E \ {(i, j 0 ) ∈ E};
Black Box Clustering and Parallel H-LU Factorisation
22/34
Nested Dissection
Constructing the Vertex Separator
Let V1 , V2 ⊂ V, V1 ∩ V2 = ∅ be a partition of G = (V, E) and let
E = {(i, j) ∈ E : i ∈ V1 , j ∈ V2 } be the edge-cut of V1 , V2 .
A vertex separator V for V1 , V2 can be obtained by computing a
vertex cover of E, i.e. a set of nodes incident to all edges in E.
Algorithm:
Loop until E 6= ∅:
• choose (i, j) ∈ E;
• choose v ∈ {i, j} such that v ∈ V 0
with #V 0 = max Vi ;
• V := V ∪ {v}; V 0 := V 0 \ {v};
• E := E \ {(i, j 0 ) ∈ E};
Black Box Clustering and Parallel H-LU Factorisation
22/34
Nested Dissection
Subdividing the Vertex Separator
In contrast to classical nested dissection, H-matrices also use a
cluster tree for indices in the vertex seperator. Hence, further
subdivision is necessary.
Problem: restricting G to nodes in V might remove important
edges, e.g.
G|V
Therefore, graph partitioning for vertex separator is performed in
sub graph induced by V1 , V2 and V.
Black Box Clustering and Parallel H-LU Factorisation
23/34
Nested Dissection
Subdividing the Vertex Separator
Modify BFS based algorithm for vertex separator:
Black Box Clustering and Parallel H-LU Factorisation
24/34
Nested Dissection
Subdividing the Vertex Separator
Modify BFS based algorithm for vertex separator:
• choose start nodes for BFS in V,
Black Box Clustering and Parallel H-LU Factorisation
24/34
Nested Dissection
Subdividing the Vertex Separator
Modify BFS based algorithm for vertex separator:
• choose start nodes for BFS in V,
• perform BFS step only for
smaller node set to achieve
balance,
Black Box Clustering and Parallel H-LU Factorisation
24/34
Nested Dissection
Subdividing the Vertex Separator
Modify BFS based algorithm for vertex separator:
• choose start nodes for BFS in V,
• perform BFS step only for
smaller node set to achieve
balance,
Black Box Clustering and Parallel H-LU Factorisation
24/34
Nested Dissection
Subdividing the Vertex Separator
Modify BFS based algorithm for vertex separator:
• choose start nodes for BFS in V,
• perform BFS step only for
smaller node set to achieve
balance,
Black Box Clustering and Parallel H-LU Factorisation
24/34
Nested Dissection
Subdividing the Vertex Separator
Modify BFS based algorithm for vertex separator:
• choose start nodes for BFS in V,
• perform BFS step only for
smaller node set to achieve
balance,
• stop BFS iteration when all
nodes in V have been visited.
Black Box Clustering and Parallel H-LU Factorisation
24/34
Nested Dissection
Subdividing the Vertex Separator
Modify BFS based algorithm for vertex separator:
• choose start nodes for BFS in V,
• perform BFS step only for
smaller node set to achieve
balance,
• stop BFS iteration when all
nodes in V have been visited.
For further subdivision, only consider visited nodes to reduce
complexity.
Remark
Still open: efficient construction of minimal surrounding
graph for subdivision of vertex separator.
Unfortunately, no graph partitioning packages, e.g. METIS,
Scotch, etc., applicable to vertex separator partitioning.
Black Box Clustering and Parallel H-LU Factorisation
24/34
Nested Dissection
Numerical Experiments
H-LU factorisation of Model Problem using nested dissection:
N
2532
3582
5112
7292
10232
403
513
643
813
1023
Geometric
Time Mem
δ
Black Box
Time Mem
δ
(sec)
(MB)
(sec)
(MB)
0.9
1.9
4.5
9.6
20.2
12.6
46.9
117.4
269.8
752.3
51
86
212
371
878
99
300
592
1410
3020
1.3
2.9
6.5
15.0
31.6
32.7
97.6
289.1
804.3
1907.3
47
94
198
402
819
135
323
719
1570
3370
110 -3
410 -4
210 -4
110 -4
610 -5
110 -2
310 -3
210 -3
110 -3
110 -3
310 -5
210 -5
910 -6
510 -6
210 -6
310 -4
210 -4
110 -4
810 -5
610 -5
Again, H-accuracy δ chosen such that
kI − (LH UH )−1 Ak2 ≤ 10−4
Black Box Clustering and Parallel H-LU Factorisation
25/34
Nested Dissection
Numerical Experiments
Comparison of algebraic H-LU factorisation with direct solvers for
in Ω = [0, 1]2
−∆u + λu = f
700
Time for setup in sec.
600
500
H-Matrix
Pardiso
MUMPS
UMFPACK
SuperLU
Spooles
400
300
200
100
0
5.0 . 105
1.0 . 106
1.5 . 106
2.0 . 106
No. of Unknowns
Black Box Clustering and Parallel H-LU Factorisation
26/34
Nested Dissection
Numerical Experiments
Comparison of algebraic H-LU factorisation with direct solvers for
in Ω = [0, 1]3
−∆u + λu = f
H-Matrix
Pardiso
MUMPS
UMFPACK
SuperLU
Spooles
18000
Time for setup in sec.
16000
14000
12000
10000
8000
6000
4000
2000
0
5.0 . 105
1.0 . 106
1.5 . 106
2.0 . 106
No. of Unknowns
Black Box Clustering and Parallel H-LU Factorisation
27/34
Parallelisation
Black Box Clustering and Parallel H-LU Factorisation
28/34
Parallelisation
Direct Domain Decomposition
Graph G is partitioned into p sub graphs decoupled by single
vertex separator:
A04
A00
A14
A11
A24
A22
Black Box Clustering and Parallel H-LU Factorisation
A41
A42
A34
A33
A40
A43 A44
29/34
Parallelisation
Direct Domain Decomposition
Graph G is partitioned into p sub graphs decoupled by single
vertex separator:
A04
A00
A14
A11
A24
A22
A41
A42
A34
A33
A40
A43 A44
Parallel H-LU factorisation on processor i:
1
factorise Aii = Lii Uii ,
solve Aip = Lii Uip and Api = Lpi Uii ,
3 compute and exchange Lpi Uip ,
P
4 update App = App −
i Lpi Uip ,
5 factorise App = Lpp Lpp
2
Black Box Clustering and Parallel H-LU Factorisation
(seq. LU Fac.)
(seq. Algo.)
(log p steps)
(seq. Matrix Mult.)
(seq. LU Fac.)
29/34
Parallelisation
Direct Domain Decomposition
For the complexity of the parallel H-LU factorisation in the model
problem, we assume
• equal load of order n/p per sub graph,
• sizes nV of vertex separator is of optimal order p1/d n(d−1)/d
Then one obtains:
12
+
1/d (d−1)/d
p
n
n = 20472
3
n = 102
10
2
log n log p
The speedup is limited by
size of vertex separator,
which increases with p.
Speedup
n log2 n
O
p
8
6
4
2
5
10
15
20
25
30
No. of Processors
Black Box Clustering and Parallel H-LU Factorisation
30/34
Parallelisation
Nested Dissection
Graph G is hierarchically partitioned with local vertex separators:
Black Box Clustering and Parallel H-LU Factorisation
31/34
Parallelisation
Nested Dissection
Graph G is hierarchically partitioned with local vertex separators:
A02
A00
A12
A20
A11
A21
A22
Parallel H-LU factorisation is based on algorithm for direct domain
decomposition with p = 2:
1
2
3
4
5
6
choose i ∈ {0, 1} such that Aii is on local processor;
factorise Aii = Lii Uii ,
(Recursion)
solve Ai2 = Lii Ui2 and A2i = L2i Uii ,
(parallel Matrix Mult.)
compute and exchange
PL2i Ui2 ,
update A22 = A22 − i L2i Ui2 ,
(seq. Matrix Mult.)
factorise A22 = L22 L22
(seq. LU Fac.)
Black Box Clustering and Parallel H-LU Factorisation
31/34
Parallelisation
Nested Dissection
Data distribution on to P := {1, . . . , p} processors follows
hierarchical decomposition during nested dissection:
• on level 0, all processors handle
the matrix,
{1, . . . , 4}
Black Box Clustering and Parallel H-LU Factorisation
32/34
Parallelisation
Nested Dissection
Data distribution on to P := {1, . . . , p} processors follows
hierarchical decomposition during nested dissection:
• on level 0, all processors handle
the matrix,
• on level 1, P is split into two
halves according to graph
bisection,
{1, 2}
{3, 4}
P
Black Box Clustering and Parallel H-LU Factorisation
32/34
Parallelisation
Nested Dissection
Data distribution on to P := {1, . . . , p} processors follows
hierarchical decomposition during nested dissection:
{1}
{2}
{1, 2}
{3}
{4}
{3, 4}
P
Black Box Clustering and Parallel H-LU Factorisation
• on level 0, all processors handle
the matrix,
• on level 1, P is split into two
halves according to graph
bisection,
• recursively divide the processor
set.
32/34
Parallelisation
Nested Dissection
Data distribution on to P := {1, . . . , p} processors follows
hierarchical decomposition during nested dissection:
{1}
{2}
{1, 2}
{3}
{4}
{3, 4}
P
• on level 0, all processors handle
the matrix,
• on level 1, P is split into two
halves according to graph
bisection,
• recursively divide the processor
set.
For processor i:
• only handle those matrices with processor set P, if i ∈ P,
• exchange data only with other processors in P.
Black Box Clustering and Parallel H-LU Factorisation
32/34
Parallelisation
Nested Dissection
For the complexity of the parallel H-LU factorisation in the model
problem, we again assume
• equal load of order n/p per sub graph,
• minimal order w.r.t. dimension d of local vertex separator
Then one obtains:
n log2 n
O
p
+
(d−1)/d
2
n = 28952
3
n = 128
25
log n log p
The speedup is now limited
by size O n(d−1)/d of first
vertex separator and much
less dependent on p.
Speedup
n
30
20
15
10
5
5
10
15
20
25
30
No. of Processors
Black Box Clustering and Parallel H-LU Factorisation
33/34
Literature
L. Grasedyck, R. Kriemann and S. Le Borne,
Domain Decomposition Based H-LU Preconditioning,
to appear in “Numerische Mathematik”.
L. Grasedyck, R. Kriemann and S. Le Borne,
Parallel Black Box H-LU Preconditioning for Elliptic Boundary Value
Problems,
“Computing and Visualization in Science”, 11(4-6), pp. 273–291, 2008.
G. Karypis and V. Kumar
A fast and high quality multilevel scheme for partitioning irregular graphs,
“SIAM Journal on Scientific Computing”, 20(1), pp. 359–392, 1999.
C.M. Fiduccia and R.M. Mattheyses,
A linear-time heuristic for improving network partitions,
In “Proceedings of the 19th Design Automation Conference”, pp.
175–181, IEEE, 1982.
Black Box Clustering and Parallel H-LU Factorisation
34/34
Download