ETNA

advertisement
ETNA
Electronic Transactions on Numerical Analysis.
Volume 3, pp. 24-38, March 1995.
Copyright  1995, Kent State University.
ISSN 1068-9613.
Kent State University
etna@mcs.kent.edu
PARALLEL, SYNCHRONOUS AND ASYNCHRONOUS TWO-STAGE
MULTISPLITTING METHODS ∗
RAFAEL BRU† , VIOLETA MIGALLÓN‡ , JOSÉ PENADÉS† , AND DANIEL B. SZYLD§
Abstract. Different types of synchronous and asynchronous two-stage multisplitting algorithms
for the solution of linear systems are analyzed. The different algorithms which appeared in the
literature are reviewed, and new ones are presented. Convergence properties of these algorithms are
studied when the matrix in question is either monotone or an H-matrix. Relaxed versions of these
algorithms are also studied. Computational experiments on a shared memory multiprocessor vector
computer are presented.
Key words. asynchronous methods, two-stage iterative methods, linear systems, multisplittings, parallel algorithms.
AMS subject classifications. 65F10, 65F15.
1. Introduction. In this paper we present various types of synchronous and
asynchronous two-stage multisplitting algorithms for the solution of linear systems of
the form
(1.1)
Ax = b,
where A is an n × n nonsingular matrix. We first give a general convergence result,
and then, for the most part, concentrate our study on two important cases: when
A is monotone, i.e., when A−1 ≥ O [1], and when A is an H-matrix [30], [41]. The
multisplitting algorithm was introduced by O’Leary and White [28] and was further
studied by many authors; see, e.g., Frommer and Mayer [10], [11], Neumann and
Plemmons [26], or White [42], [44].
One can think of the multisplitting algorithm as an extension and parallel generalization of the classical Block Jacobi algorithm, see, e.g., Varga [40], which we review
in what follows. Let the matrix A is partitioned into L × L blocks


A11 A12 · · · A1L
 A21 A22 · · · A2L 


(1.2)
 ..
..
..  ,
 .
.
. 
AL1
· · · ALL
AL2
with the diagonal blocks A`` being square nonsingular of order n` , ` = 1, . . . , L,
L
X
n` = n, and the vectors x and b are partitioned conformally.
`=1
∗
Received October 3, 1994. Accepted February 27, 1995. Communicated by Richard Varga.
Departament de Matemàtica Aplicada, Universitat Politècnica de València, E-46071 València,
Spain (rbru@mat.upv.es). This research was partially supported by both Spanish CICYT grant
number TIC91-1157-C03-01 and the ESPRIT III basic research programme of the EC under contract
No. 9072 (project GEPPCOM).
‡ Departament de Tecnologia Informàtica i Computació, Universitat d’Alacant, E-03080 Alacant,
Spain (violeta@dtic.ua.es). (jpenades@dtic.ua.es). This research was supported by Spanish
CICYT grant number TIC91-1157-C03-01.
§ Department of Mathematics, Temple University, Philadelphia, Pennsylvania 19122-2585, USA
(szyld@euclid.math.temple.edu). This research was supported by the National Science Foundation
grant DMS-9201728.
†
24
ETNA
Kent State University
etna@mcs.kent.edu
25
Bru, Migallón, Penadés and Szyld
(1)
(L)
Algorithm 1. (Block Jacobi). Given the initial vector xT0 = [(x0 )T , . . . , (x0 )T ].
(1.3)
For i = 1, 2, . . . , until convergence.
For ` = 1 to L
L
X
(`)
(k)
A`` xi = b(`) −
A`k xi−1 .
k=1,k6=`
Each system (1.3) can be solved in parallel by a different processor of a parallel
computer, and the vector iterate at each step is
(1)
(2)
(L)
xTi = [(xi )T , (xi )T , . . . , (xi )T ].
The multisplitting algorithm [28] consists of having a collection of splittings
A = M ` − N` ,
(1.4)
` = 1, . . . , L,
and diagonal nonnegative weighting matrices E` which add to the identity, and performing the following algorithm.
Algorithm 2. (Multisplitting). Given the initial vector x0 .
For i = 1, 2, . . . , until convergence.
For ` = 1 to L
M` y` = N` xi−1 + b
(1.5)
xi =
(1.6)
L
X
E` y` .
`=1
As it can be appreciated, Algorithm 1 can be seen as a special case of Algorithm
2 when all splittings are the same, namely, M` = Diag (A11 , . . . , ALL ), the blockdiagonal matrix consisting of the diagonal blocks of A, and the diagonal weighting
matrices E` , have ones in the entries corresponding to the diagonal block A`` and zero
otherwise. In this case we say that the matrices E` form a partition of the identity.
A rendition of Algorithm 1 can also be obtained from Algorithm 2 by setting
(1.7)
(1.8)
M`
E`
=
=
Diag (I, . . . , I, A`` , I, . . . , I),
Diag (O, . . . , O, I, O, . . . , O),
and
for ` = 1, . . . , L, i.e., the same partition of the identity just discussed.
Convergence of the multisplitting algorithm was first established for A−1 ≥ O by
O’Leary and White [28] when the splittings (1.4) are weak regular, i.e., when M`−1 ≥ O
and M`−1 N` ≥ O, where the inequalities are understood component-wise [1], [29], [40].
Comparison of convergence of different splittings (1.4) when they are M -splittings,
i.e., when M` is an M -matrix (defined in the next section) and N` ≥ O [23], [35], was
studied by Neumann and Plemmons [26]. If the splittings (1.4) correspond to diagonal
blocks of A of larger size than the corresponding number of ones in the partition of
the identity, i.e, if A`` in (1.7) has order ñ` > n` , the order of I in (1.8), then the
multisplitting Algorithm 2 corresponds to an overlapping algorithm, which was shown
to have better asymptotic convergence rate than the Block Jacobi algorithm (without
overlap); see Frommer and Pohl [12], [13], and also Jones and Szyld [21]. Weighting
matrices which are not partitions of the identity, and therefore induce another type
ETNA
Kent State University
etna@mcs.kent.edu
26
Parallel, synchronous and asynchronous two-stage multisplitting methods
of overlap, have been also studied; see, e.g., O’Leary and White [28], White [43], or
Frommer and Mayer [11].
As is the case for Block Jacobi, the systems (1.5) can be solved by different
processors in parallel. Moreover, the components of y` corresponding to zeros in the
diagonal of E` need not be computed; see (1.6).
When the linear systems (1.5), or (1.3), are not solved exactly, but rather their
solutions approximated by iterative methods, we are in the presence of a two-stage
method. Two-stage methods, sometimes called inner-outer iterations, were studied,
e.g., by Nichols [27], Golub and Overton [17], [18], Lanzkron, Rose and Szyld [22],
Frommer and Szyld [14], [15], Bru, Elsner and Neumann [3], and Bru, Migallón and
Penadés [6]. When the systems (1.5) are solved iteratively in each processor, using
the splittings
(1.9)
M` = B` − C` ,
` = 1, . . . , L,
and performing a fixed number s of iterations, one obtains the following algorithm.
Algorithm 3. (Two-stage Multisplitting). Given the initial vector x0 , and
the fixed number s of inner iterations.
For i = 1, 2, . . . , until convergence.
For ` = 1 to L
y`,0 = xi−1
For j = 1 to s
(1.10)
B` y`,j = C` y`,j−1 + N` xi−1 + b
(1.11)
xi =
L
X
E` y`,s .
`=1
Convergence of this algorithm for any number of inner iterations s was established
for A−1 ≥ O by Szyld and Jones [38] when the outer splittings (1.4) are regular
splittings, i.e., when M`−1 ≥ O and N` ≥ O [1], [40], and the inner splittings (1.9) are
weak regular splittings.
Algorithm 3 reduces to Algorithm 2 when the inner splitting (1.9) is the trivial
M` = M` − O, and s = 1. We also note that, under certain circumstances, if the
number of inner iterations is large enough, i.e., as s → ∞, the iterates produced by
Algorithm 3 resemble those produced by Algorithm 2, i.e., y`,s → y` , as s → ∞, where
y` is the solution of (1.5); cf. [14, section 2] and the proof of Theorem 3.1.
When the number of inner iterations varies for each splitting and for each outer
iteration, i.e., when s = s(`, i) in Algorithm 3, we say that we have a Non-stationary
Two-stage Multisplitting Algorithm (Algorithm 4).
Model A in Bru,
Elsner and Neumann [2] is a special case of this algorithm, when the outer splittings
(1.4) are all A = A − O.
A relaxation parameter ω > 0 can be introduced and replace the computation of
y`,j in (1.10) with the equation
B` y`,j = ω(C` y`,j−1 + N` xi−1 + b) + (1 − ω)B` y`,j−1 .
1
1−ω
This is equivalent to replacing the splitting (1.9) by M` = B` −
B` + C` .
ω
ω
Clearly, with ω = 1, equation (1.10) is recovered. In the case of ω 6= 1 we have a Relaxed Non-stationary Two-stage Multisplitting Algorithm (Algorithm
(1.12)
ETNA
Kent State University
etna@mcs.kent.edu
Bru, Migallón, Penadés and Szyld
27
5); see also Bru and Fuster [4], Bru, Migallón and Penadés [6], Deren [7], Frommer
and Mayer [10], Fuster, Migallón and Penadés [16], and Sun [36] where relaxed (or
extrapolated) algorithms are considered.
In this paper we study the convergence of the Non-stationary Two-stage Multisplitting Algorithm 4, together with its relaxed version, Algorithm 5, see section
3. We also study their extension to two different asynchronous algorithms, where the
(approximate) solutions of the systems (1.5), by repeated solutions of (1.10) or (1.12),
proceed in each processor without waiting for the completion of the computation of
the iterates in the other processors; see section 4.
For each equation (1.12) one can work with a different relaxation parameter ω` ,
` = 1, . . . , L, as is done, e.g., in the MSOR method; see, e.g., [19]. We point out that
the convergence results in this paper can be readily extended to this multi-parameter
case without any difficulty.
In the next section we present some notation, definitions and preliminary results
which we refer to later, while in section 5 we present some numerical experiments,
which illustrate the performance of the algorithms studied.
Some of the results in this paper were presented in preliminary form in [5] and
[37].
2. Preliminaries. We say that a vector x is nonnegative (positive), denoted
x ≥ 0 (x > 0), if all its entries are nonnegative (positive). Similarly, a matrix
B is said to be nonnegative, denoted B ≥ O, if all its entries are nonnegative or,
equivalently, if it leaves invariant the set of all nonnegative vectors. We compare two
matrices A ≥ B, when A − B ≥ O, and two vectors x ≥ y (x > y) when x − y ≥ 0
(x − y > 0). Given a matrix A = (aij ), we define the matrix |A| = (|aij |). It follows
that |A| ≥ O and that |AB| ≤ |A| |B| for any two matrices A and B of compatible
size.
Let Z n×n denote the set of all real n × n matrices which have all non–positive
off–diagonal entries. A nonsingular matrix A ∈ Z n×n is called M -matrix if A−1 ≥ O,
i.e., if A is a monotone matrix; see, e.g., Berman and Plemmons [1] or Varga [40]. By
ρ(A) we denote the spectral radius of the square matrix A.
For any matrix A = (aij ) ∈ IRn×n , we define its comparison matrix hAi = (αij )
by αii = |aii |, αij = −|aij |, i 6= j. Following Ostrowski [30], [31], a nonsingular
matrix A is said to be an H-matrix if hAi is an M -matrix. Of course, M -matrices are
special cases of H-matrices. H-matrices, which are not necessarily monotone, arise
in many applications and were studied by a number of authors in connection with
iterative solutions of linear systems; see, e.g., the classical paper by Varga [41], or
Frommer and Szyld [14] for an extensive bibliography and some examples.
Lemma 2.1. Let A, B ∈ IRn×n .
(a) If A is an M -matrix, B ∈ Z n×n , and A ≤ B, then B is an M -matrix.
(b) If A is an H-matrix, then |A−1 | ≤ hAi−1 .
(c) If |A| ≤ B then ρ(A) ≤ ρ(B).
Proof. (a) and (c) can be found, e.g., in [29], 2.4.10 and 2.4.9, respectively. Part
(b) goes back to Ostrowski [30]; see also, e.g., Neumaier [25].
Definition 2.2. Let A ∈ IRn×n . The representation A = M − N is called a
splitting if M is nonsingular. It is called a convergent splitting if ρ(M −1 N ) < 1. A
splitting A = M − N is called
(a) regular if M −1 ≥ O and N ≥ O [39], [40],
(b) weak regular if M −1 ≥ O and M −1 N ≥ O [1], [29],
(c) H-splitting if hM i − |N | is an M -matrix [14], and
ETNA
Kent State University
etna@mcs.kent.edu
28
Parallel, synchronous and asynchronous two-stage multisplitting methods
(d) H-compatible splitting if hAi = hM i − |N | [14].
Lemma 2.3. Let A = M − N be a splitting.
(a) If the splitting is weak regular, then ρ(M −1 N ) < 1 if and only if A−1 ≥ O.
(b) If the splitting is an H-splitting, then A and M are H-matrices and ρ(M −1 N ) ≤
ρ(hM i−1 |N |) < 1.
(c) If the splitting is an H-compatible splitting and A is an H-matrix, then it is an
H-splitting and thus convergent.
Proof. (a) can be found, e.g., in [1], [29], [40]. The first part of (b) was shown in
[24], [25]. The second part as well as (c) is found in [14].
Lemma 2.4. [34] Let H1 , H2 , . . . , Hi , . . . be a sequence of nonnegative matrices in
IRn×n . If there exist a real number 0 ≤ θ < 1, and a vector v > 0 in IRn , such that
Hj v ≤ θv, for all j = 1, 2, . . . ,
then ρ(Vi ) ≤ θi < 1, where Vi = Hi · · · H2 · H1 , and lim Vi = O.
i→∞
3. Synchronous Iterations. In this section we give convergence results for the
Non-stationary Two-stage Multisplitting Algorithm 4 and its relaxed variant Algorithm 5. We first show that if the outer and inner splittings (1.4) and (1.9) are
convergent, and if enough inner iterations are performed, then, the algorithms are
convergent. Later we show that if certain conditions are imposed on the splittings,
the methods converge for any number of inner iterations.
Let x? be the solution of (1.1) and let ei = x? − xi be the error at the ith outer
iteration of Algorithm 4. Let R` = B`−1 C` . We rewrite (1.11) as
s(`,i)−1
L
X
X j
s(`,i)
xi =
E` [R`
xi−1 +
R` B`−1 (N` xi−1 + b)];
j=0
`=1
cf. [15], [22]. Thus,
(3.1)
ei = H(i)ei−1 = H(i)H(i − 1) · · · H(1)e0 ,
where
(3.2)
s(`,i)−1
L
X
X j
s(`,i)
H(i) =
E` [R`
+
R` B`−1 N` ]
j=0
`=1
L
X
s(`,i)
s(`,i)
=
E` [R`
+ (I − R`
)M`−1 N` ] = I − Q(i)A,
`=1
with
(3.3)
Q(i) =
s(`,i)−1
L
L
X
X
X
s(`,i)
E` (I − R`
)M`−1 =
E`
R`j B`−1 ≥ O.
`=1
`=1
j=0
Similarly, if ei = x? − xi is the error at the ith outer iteration of Algorithm 5 and
(3.4)
S` = S` (ω) = (1 − ω)I + ωR` ,
it follows from (1.12) and (1.11) that ei = T (i)ei−1 = T (i)T (i − 1) · · · T (1)e0 , where
(3.5)
s(`,i)−1
L
X
X j
s(`,i)
T (i) =
E` [S`
+
S` B`−1 N` ],
`=1
j=0
ETNA
Kent State University
etna@mcs.kent.edu
29
Bru, Migallón, Penadés and Szyld
i.e., the iteration matrix for Algorithm 5, is similar to (3.2) of Algorithm 4, where R`
is replaced by S` .
Theorem 3.1. Let A be non-singular. Let the splittings (1.4) be such that
kM`−1 N` k∞ < 1,
(3.6)
` = 1, . . . , L,
and let the splittings (1.9) be convergent. Assume further that lim s(`, i) = ∞, ` =
i→∞
1, . . . , L. Then, the Non-stationary Two-stage Multisplitting Algorithm 4 converges
2
where ρ =
to x? with Ax? = b. If in addition, we assume that 0 < ω <
1+ρ
max{ρ(B`−1 C` ), 1 ≤ ` ≤ L}, then the theorem holds for the Relaxed Non-stationary
Two-stage Multisplitting Algorithm 5.
Proof. Consider first ω = 1, i.e., Algorithm 4. Let R` = B`−1 C` . Since ρ(R` ) < 1,
lim R`i = O, for ` = 1, . . . , L. Then, given an > 0, there exists an integer s0 such
i→∞
that kR`s k∞ ≤ , for all s ≥ s0 , ` = 1, . . . , L. Since lim s(`, i) = ∞, there exists an i0
i→∞
s(`,i)
such that kR`
k∞ ≤ , for all i ≥ i0 , ` = 1, . . . , L. Let β be a real constant such
that kM`−1 N` k∞ ≤ β < 1, for ` = 1, . . . , L. The existence of such β follows from
(3.6). Then, for i ≥ i0
h
i
s(`,i)
s(`,i)
kH(i)k∞ ≤ max kR`
+ (I − R`
)M`−1 N` k∞
1≤`≤L
h
i
s(`,i)
s(`,i)
≤ max kR`
k∞ + 1 + kR`
k∞ kM`−1 N` k∞
1≤`≤L
≤ max + (1 + )kM`−1 N` k∞
1≤`≤L
≤ + (1 + )β ≡ α .
1−β
we have α < 1 and the errors (3.1) convergence to zero.
1+β
For the case of Algorithm 5, i.e., for ω 6= 1, observe that it follows from (3.4) that
Setting <
(3.7)
ρ(S` (ω)) < 1, for ` = 1, . . . , L
and the proof follows in the same way.
The proof of Theorem 3.1 resembles that of [14, Theorem 2.4]. Here we need
the additional hypothesis (3.6) since we have L different splittings. We also note that
Theorem 3.1 can be proved in the same way if the norm in assumption (3.6) is replaced
by any norm such that if for arbitrary matrices U , U` , and weighting matrices E` ,
L
X
E` U` , then kU k ≤ max kU` k; see Bru and Fuster [4].
` = 1, . . . , L, such that U =
`=1
1≤`≤L
In particular, one can use any weighted max–norm associated with a positive vector;
see, e.g., Householder [20], Rheinboldt and Vandergraft [33], or Frommer and Szyld
[15], for descriptions and applications of these norms.
Theorem 3.2. Let A−1 ≥ O. Let the splittings (1.4) be regular and the splittings
(1.9) be weak regular. Then, the Non-stationary Two-stage Multisplitting Algorithm
4 converges to x? with Ax? = b for any initial vector x0 and any sequence of numbers
of inner iterations s(`, i) ≥ 1, ` = 1, . . . , L, i = 1, . . . . If in addition, we assume
0 < ω < 1, the theorem holds for the Relaxed Non-stationary Two-stage Multisplitting
Algorithm 5.
ETNA
Kent State University
etna@mcs.kent.edu
30
Parallel, synchronous and asynchronous two-stage multisplitting methods
Proof. Let 0 < ω ≤ 1. Algorithm 4 corresponds to ω = 1, while Algorithm 5
corresponds to ω < 1. Let ei = x? − xi be the error at the ith outer iteration of either
Algorithm 4 or Algorithm 5. We rewrite (3.5) as
(3.8)
L
X
E` T` (i),
T (i) =
`=1
where
X
s(`,i)−1
s(`,i)
(3.9)
T` (i) = S`
+
S`j B`−1 N` ≥ O,
j=0
where the inequality follows from the assumptions on the outer and inner splittings.
Similarly to (3.2) and (3.3), it follows from (3.9) that
T` (i) = I − P` (i)A,
(3.10)
X
s(`,i)−1
where P` (i) = (I −
s(`,i)
S`
)M`−1
I − S` = ωB`−1 M` , and thus,
=
S`j (I − S` )M`−1 . From (3.4) it follows that
j=0
X
s(`,i)−1
(3.11)
P` (i) = ω
S`j B`−1 .
j=0
Consider any fixed vector e > 0 (e.g., with all components equal to 1), and v = A−1 e.
Since A−1 ≥ O and no row of A−1 can have all null entries, we get v > 0. By the
same arguments B`−1 e > 0, ` = 1, . . . , L. We have from (3.10) and (3.11) that
X
s(`,i)−1
T` (i)v = (I − P` (i)A)v = v − P` (i)e = v − ωB`−1 e − ω
S`j B`−1 e.
j=1
X
s(`,i)−1
We have that
S`j B`−1 e ≥ 0. Moreover, since T` (i)v ≥ 0 and v − ωB`−1 e < v,
j=1
there exist constants 0 ≤ θ` < 1 such that v − ωB`−1 e ≤ θ` v. Thus, T` (i)v ≤ θ` v for
all ` = 1, . . . , L, i = 1, 2, . . . . Let θ = max{θ` , 1 ≤ ` ≤ L}. From (3.8) we have then
that
(3.12)
T (i)v ≤ θv, for all i = 1, 2, . . . .
By Lemma 2.4 this implies that the product V (i) = T (i) · T (i − 1) · · · T (1) tends to
zero as i → ∞, and thus lim ei = 0.
i→∞
The proof of Theorem 3.2 uses techniques similar to ones used in Theorem 2.1 in
[2], in theorems 4.3 and 4.4 in [14] and in Theorem 2.2 in [15]. We point out that the
bounds (3.12) are independent of the sequence s(`, i) ≥ 1, ` = 1, . . . , L, i = 1, 2, . . . .
Theorem 3.3. Let A be an H-matrix. Let the splittings (1.4) and (1.9) be Hcompatible splittings. Then, the Non-stationary Two-stage Multisplitting Algorithm 4
converges to x? with Ax? = b for any initial vector x0 and any sequence of numbers
ETNA
Kent State University
etna@mcs.kent.edu
Bru, Migallón, Penadés and Szyld
31
of inner iterations s(`, i) ≥ 1, ` = 1, . . . , L, i = 1, . . . . If in addition, we assume
0 < ω < 1, the theorem holds for the Relaxed Non-stationary Two-stage Multisplitting
Algorithm 5.
Proof. Let 0 < ω ≤ 1. Algorithm 4 corresponds to ω = 1, while Algorithm 5
corresponds to ω < 1. Let ei = x? − xi be the error at the ith outer iteration of
either Algorithm 4 or Algorithm 5. From (3.4), using Lemma 2.1 (b) we obtain the
following bound
|S` | ≤ (1 − ω)I + ω|B`−1 ||C` | ≤ (1 − ω)I + ωhB` i−1 |C` | ≡ Ŝ` .
Thus, from (3.5), and again using Lemma 2.1 (b), we obtain
(3.13)
|T (i)| ≤
s(`,i)−1
L
X
X j
s(`,i)
Ŝ` hB` i−1 |N` |] ≡ T̂ (i).
E` [Ŝ`
+
j=0
`=1
The matrix T̂ (i) is the iteration matrix corresponding to the ith iteration of a Relaxed
Non-stationary Two-stage Multisplitting Algorithm for the monotone matrix hAi =
hM` i − |N` | with the regular splittings hM` i − |N` | and hM` i = hB` i − |C` |. These
matrices and splittings satisfy the hypothesis of Theorem 3.2 and we have, as in (3.12),
(3.14)
|T (i)|v ≤ T̂ (i)v ≤ θv, for all i = 1, . . .
for some v ∈ IRn , v > 0 and θ ∈ [0, 1). Let V (i) = T (i) · T (i − 1) · · · T (1). We can
bound |V (i)| ≤ |T (i)| · |T (i − 1)| · · · |T (1)|. Therefore by (3.14) and Lemma 2.4, V (i)
tends to zero as i → ∞, implying lim ei = 0.
i→∞
The proof of Theorem 3.3 follows from that of Theorem 3.2 in a way similar
to the way that [15, Theorem 2.3] follows from [15, Theorem 2.2]. Here we need
the additional hypothesis that the outer splittings be H-compatible, so that hAi =
hM` i − |N` | for all ` = 1, . . . , L.
4. Asynchronous Iterations. All methods described in section 1 and studied
in section 3 are synchronous in the sense that step (1.11) is performed only after all
approximations to the solutions of (1.5) are completed (` = 1, . . . , L). Alternatively, if
(`)
the weighting matrices form a partition of the identity, each part of xi , say xi = E` xi ,
can be updated as soon as the approximation to the solution of the corresponding
system (1.5) is completed, without waiting for the other parts of xi to be updated.
Thus, the previous iterate xi−1 is no longer available for the computation of (1.5) or
(1.10). Instead, parts of the current iterate are updated using a vector composed of
parts of different previous, not necessarily the latest, iterates; cf. [2, Model B], [15,
section 3], [16, section 4].
As is customary in the description and analysis of asynchronous algorithms, the
iteration subscript is increased every time any part of the iteration vector is computed;
see, e.g., the references in [2], [6], [8], [9], [15], [16]. In a formal way, the sets Ji ⊆
{1, 2, . . . , L}, i = 1, 2, . . . , are defined by ` ∈ Ji if the `th part of the iteration vector
is computed at the ith step. The subscripts r(k, i) are used to denote the iteration
number of the kth part being used in the computation of any part in the ith iteration,
i.e., the iteration number of the kth part available at the beginning of the computation
(`)
of xi , if ` ∈ Ji .
Any arbitrary n × n matrix U can be decomposed into L n × n matrices U (`) ,
L
X
U (`) u, where the nonzeros in U (`) u
` = 1, . . . , L, so that for any vector u, U u =
`=1
ETNA
Kent State University
etna@mcs.kent.edu
32
Parallel, synchronous and asynchronous two-stage multisplitting methods
correspond to the nonzero elements in E` which form a partition of the identity, i.e.,
U (`) = E` U . With this notation, we write the following algorithm.
Algorithm 6. (Outer Asynchronous Two-stage Multisplitting).
(1)
(L)
Given the initial vector x0 = x0 + · · · + x0
For i = 1, 2, . . .
( (`)
xi−1 if ` 6∈ Ji
(`)
(4.1)
xi =
(1)
(L)
(`)
(`)
H (i) xr(1,i) + · · · + xr(L,i) + Q (i)b if ` ∈ Ji .
with H(i) and Q(i) as defined in (3.2) and (3.3), respectively.
For easy comparison with Algorithm 8, we rewrite (4.1) explicitly as

(`)

if ` 6∈ Ji

 xi−1
s(`,i)−1
L
(`)
X
X
(4.2) xi =
s(`,i) (`)
(k)

E [R
xr(`,i) +
R`j B`−1 (N`
xr(k,i) + b)] if ` ∈ Ji .

 ` `
j=0
k=1
We always assume that the asynchronous iterations satisfy the following conditions. They are very natural in asynchronous computations; see, e.g., [9], [15].
(4.3)
(4.4)
r(`, i) < i for all ` = 1, . . . , L, i = 1, 2, . . . .
lim r(`, i) = ∞ for all ` = 1, . . . , L.
(4.5)
The set {i | ` ∈ Ji } is unbounded for all ` = 1, . . . , L.
i→∞
Condition (4.3) simply states that only components previously computed are used,
and not future ones. Condition (4.5) is equivalent to what Bru, Elsner and Neumann
[2] and others, e.g., Bru, Migallón and Penadés [6], call an admissible sequence, i.e.,
implying that every component is updated infinitely often. These authors also use the
concept of a regulated sequence, i.e., implying that the number of iterations between
two updates of the same component is uniformly bounded. Our condition (4.4) is
slightly more general since no such bound is assumed.
If in (4.1), i.e., in Algorithm 6, we replace H(i) by T (i) as defined in (3.5),
L
X
using S` (ω) as defined in (3.4), ω > 0, and if we replace Q(i) by P (i) =
E` P` (i),
`=1
where P` (i) are as in (3.11), we obtain a Relaxed Outer Asynchronous Twostage Multisplitting Algorithm (Algorithm 7). This algorithm, which is the
asynchronous version of Algorithm 5, can also be obtained by replacing R` by S` (ω)
in (4.2).
Theorem 4.1. Let A−1 ≥ O. Let the splittings (1.4) be regular and the splittings
(1.9) be weak regular. Assume that the sequence r(`, i) and the sets Ji , ` = 1, . . . , L,
i = 1, 2, . . . , satisfy conditions (4.3)–(4.5). Then, the Outer Asynchronous Twostage Multisplitting Algorithm 6 converges to x? with Ax? = b for any initial vector
x0 and for any sequence of numbers of inner iterations s(`, i) ≥ 1, ` = 1, . . . , L,
i = 1, 2, . . . . If in addition, we assume 0 < ω < 1, the theorem holds for the Relaxed
Outer Asynchronous Two-stage Multisplitting Algorithm 7.
Proof. The proof follows in the same way as the proof of [15, Theorem 3.3] which
in turn is based on [8, Theorem 3.4].
Theorem 4.2. Let A be an H-matrix. Let the splittings (1.4) and (1.9) be Hcompatible splittings. Assume that the sequence r(`, i) and the sets Ji , ` = 1, . . . , L,
ETNA
Kent State University
etna@mcs.kent.edu
Bru, Migallón, Penadés and Szyld
33
i = 1, 2, . . . , satisfy conditions (4.3)–(4.5). Then, the Outer Asynchronous Twostage Multisplitting Algorithm 6 converges to x? with Ax? = b for any initial vector
x0 and for any sequence of numbers of inner iterations s(`, i) ≥ 1, ` = 1, . . . , L,
i = 1, 2, . . . . If in addition, we assume 0 < ω < 1, the theorem holds for the Relaxed
Outer Asynchronous Two-stage Multisplitting Algorithm 7.
Proof. The proof follows in the same way as the proof of [15, Theorem 3.4].
We consider now asynchronous two-stage multisplittings algorithm where, at each
inner iteration, the most recent information from the other parts of the iterate is used.
(k)
In other words, the parts xr(k,i) in the sum in (4.2) may differ for different values of
j, j = 0, . . . , s(`, i) − 1 (` ∈ Ji ). To reflect this, we therefore use indices of the
(1)
form r(k, j, i). Thus, for example, if the subvector xi = E1 xi is being computed
by one processor, one component at a time, the computed components can be read
by the other processors and used in the computation of the other subvectors x(`) ,
(1)
(1)
before all components of xi are computed, and xr(1,j,i) may change for different j,
(1)
j = 0, . . . , s(`, i)−1 (` ∈ Ji ) while xi is being computed. These algorithms are called
totally asynchronous to distinguish them from the outer asynchronous ones; see [15,
section 4] for a discussion of possible advantages of these methods.
Algorithm 8. (Totally Asynchronous Two-stage Multisplitting).
(1)
(L)
Given the initial vector x0 = x0 + · · · + x0
For i = 1, 2, . . .

(`)

if ` 6∈ Ji

 xi−1
s(`,i)−1
L
(`)
X
X (k)
(4.6) xi =
s(`,i) (`)

E [R
xr(`,0,i) +
R`j B`−1 (N`
xr(k,j,i) + b)] if ` ∈ Ji .

 ` `
j=0
k=1
Analogous to (4.3)–(4.4) we now assume
(
r(k, j, i) < i, for all k = 1, . . . , L, j = 0, . . . , s(k, i) − 1, i = 1, 2, . . . ,
(4.7)
lim
min
r(k, j, i) = ∞, for all k = 1, . . . , L.
i→∞
j=0,...,s(k,i)−1
Again, if R` is replaced by S` (ω) in (4.6) we obtain a Relaxed Totally Asynchronous Two-stage Multisplitting Algorithm (Algorithm 9).
Theorem 4.3. Let A−1 ≥ O. Let the splittings (1.4) be regular and the splittings
(1.9) be weak regular. Assume that the numbers r(k, j, i) and the sets Ji , k = 1, . . . , L,
i = 1, 2, . . . , satisfy conditions (4.5) and (4.7). Then, the Totally Asynchronous Twostage Multisplitting Algorithm 8 converges to x? with Ax? = b for any initial vector
x0 and for any sequence of numbers of inner iterations s(`, i) ≥ 1, ` = 1, . . . , L,
i = 1, 2, . . . . If in addition, we assume 0 < ω < 1, the theorem holds for the Relaxed
Totally Asynchronous Two-stage Multisplitting Algorithm 9.
Proof. The proof follows in the same way as the proof of [15, Theorem 4.3] which
in turn is based on [9, Theorem 2.1].
Theorem 4.4. Let A be an H-matrix. Let the splittings (1.4) and (1.9) be Hcompatible splittings. Assume that the numbers r(k, j, i) and the sets Ji , k = 1, . . . , L,
i = 1, 2, . . . , satisfy conditions (4.5) and (4.7). Then, the Totally Asynchronous Twostage Multisplitting Algorithm 8 converges to x? with Ax? = b for any initial vector
x0 and for any sequence of numbers of inner iterations s(`, i) ≥ 1, ` = 1, . . . , L,
i = 1, 2, . . . . If in addition, we assume 0 < ω < 1, the theorem holds for the Relaxed
Totally Asynchronous Two-stage Multisplitting Algorithm 9.
Proof. The proof follows in the same way as the proof of [15, Theorem 4.4].
ETNA
Kent State University
etna@mcs.kent.edu
34
Parallel, synchronous and asynchronous two-stage multisplitting methods
5. Numerical Experiments. We implemented the synchronous and outer asynchronous algorithms described in this paper on a shared memory computer Alliant
FX/80 with 8 vector processors. We considered the problem of Laplace’s equation
satisfying Dirichlet boundary conditions in the rectangle Ω = [0, a] × [0, b]. We discretized, using five point finite differences, the domain Ω with J × K points equally
spaced by h. This discretization yields the linear system (1.1), where A is block tridiagonal, A = tridiag [−I, B, −I], where I and B are K × K matrices, I is the identity,
and the matrix B = tridiag [−1, 4, −1]. Thus, A has J × J blocks of size K × K. It
follows directly that A is an M -matrix, and thus also an H-matrix.
In order to fully utilize the 8 processors, we let L = 8. The matrix A can be
written as
@@
−I
@@
−I @
@@D @−I · ·
−I · ·
D1
2
A=
··
·
··
−I
,
−I
@
@@D @
8
where D` = tridiag [−I, B, −I] is of order n` , ` = 1, . . . , 8; cf. (1.2). Consider an
(outer) Block Jacobi Algorithm, and thus we have the matrices M` = Diag [D1 , D2 ,
. . . , D8 ], ` = 1, . . . , 8, and N` such that A = M` − N` . The matrices E` , ` = 1, . . . , 8,
are partitioned according to the structure of the matrices M` , with the `th block equal
to the identity block and 0’s elsewhere, i.e., they form a partition of the identity; see
the discussion after Algorithm 2. Thus, when we perform s(`, i) steps of an iterative
method to solve the linear system M` y = N` xi−1 + b, we only need to update the `th
block of y; see (1.5) and (1.6). In the experiments reported here, s(`, i) steps of the
SOR method are performed to approximate the linear systems (1.5) .
Experiments were performed with matrices of different orders. The results are
similar for all matrices; see [32]; and therefore we only present here results for n =
5632. The matrix A has J = 11 diagonal blocks B of size K = 512, and we chose
n` = 1024, for ` = 1, 2, 3, and n` = 512, for ` = 4, . . . , 8, to study the effect of different
number of nonzeros in E` . With this choice, we can obtain a good load balance by
having processors 4–8 perform twice as many (inner) iterations as processors 1–3.
Recall that in this example the number of nonzeros is proportional to the order of
the blocks. This example is similar to the situation in which the diagonal blocks
correspond to natural physical structures of different size, or to a discretization of
varying resolution.
We report the CP U time in seconds (for the concurrent compilation) as function
of different relaxation parameters ω > 0 for selected values of the non-stationary parameters s(`, i). In Figure 5.1 the results for the synchronous algorithms are presented.
We use the notation 23 45 to represent the non-stationary parameters s(`, i) = 2 for
` = 1, 2, 3 and s(`, i) = 4 for ` = 4, . . . , 8, for all i = 1, . . . , i.e., to represent that each
of the first three processors updates its vector twice and each of the last five processors updates its vector four times, in any outer iteration. Similar notation is used for
other values of the non-stationary parameters. In Figure 5.1 it can be observed that
the CP U time for the algorithm with one inner iteration (s(`, i) = 1, or 18 ) is larger
than that for the other cases, both in the stationary case with more inner iterations
ETNA
Kent State University
etna@mcs.kent.edu
35
Bru, Migallón, Penadés and Szyld
80
Synchronous non-stationary 23 45
Synchronous non-stationary 13 25
Synchronous stationary 28
Synchronous stationary 18
70
60
50
Time
40
30
20
10
0.4
0.6
0.8
1
1.2
Relaxation parameter
1.4
1.6
Fig. 5.1. Synchronous algorithms, n = 5632.
or in the non-stationary cases.
We point out that in the proofs of theorems 3.2–4.4, the restriction ω ≤ 1 is
needed so that (3.7) holds. Nevertheless, as can be seen in the experiments reported
here, the conclusions of the theorems hold in these examples for certain ω > 1.
55
Asynchronous non-stationary 2 3 45
Asynchronous non-stationary 13 25
Asynchronous stationary 28
Asynchronous stationary 18
50
45
40
Time
35
30
25
20
15
10
5
0.4
0.6
0.8
1
1.2
1.4
Relaxation parameter
1.6
1.8
Fig. 5.2. Asynchronous algorithms, n = 5632.
In Figure 5.2 the CP U time of the corresponding outer asynchronous algorithms
are presented. In this case, non-stationary algorithms behave better than the stationary ones. Observe that when ω is optimal, all versions give similar times, but
when ω is less than the optimal value, in particular when ω = 1, the non-stationary
ETNA
Kent State University
etna@mcs.kent.edu
36
Parallel, synchronous and asynchronous two-stage multisplitting methods
schemes produce faster convergence. Finally, we note that, with the choice of the
orders n` used to achieve good load balancing, the asynchronous algorithms are better than the synchronous algorithms. Figure 5.3 illustrates this consideration for the
non-stationary parameters 23 45 .
80
Synchronous non-stationary 23 45
Asynchronous non-stationary 23 45
70
60
50
Time
40
30
20
10
0.4
0.6
0.8
1
1.2
1.4
Relaxation parameter
1.6
1.8
Fig. 5.3. Synchronous and asynchronous algorithms, n = 5632.
6. Conclusion. We have shown that, under certain conditions, the synchronous
and asynchronous two-stage multisplitting algorithms presented in this paper converge
for any number of inner iterations s(`, i). This theory permits the use of different
stopping criteria for the inner iterations, e.g., by specifying a certain tolerance for the
inner residual; cf. Golub and Overton [17], [18].
The choice of optimal sequences s(`, i) – or simply good ones – is problem dependent, and not very well understood. In our experience, often few inner iterations
produce good overall convergence results. Moreover, in multiprocessors, a good choice
for this sequence is one which counterweights the work in each processor, producing
a good overall load balance. The experiments in section 5 illustrate such choices.
Acknowledgments. We acknowledge and appreciate the referees’ questions and
recommendations which led to improvements in the paper.
REFERENCES
[1] Abraham Berman and Robert J. Plemmons, Nonnegative Matrices in the Mathematical
Sciences. Academic Press, New York, third edition, 1979. Reprinted by SIAM, Philadelphia,
1994.
[2] Rafael Bru, Ludwig Elsner, and Michael Neumann, Models of parallel chaotic iteration
methods, Linear Algebra Appl., 103 (1988), pp. 175–192.
[3]
, Convergence of infinite products of matrices and inner-outer iteration schemes, Electron. Trans. Numer. Anal., 2 (1994), pp. 183–193.
[4] Rafael Bru and Robert Fuster, Parallel chaotic extrapolated Jacobi method, Appl. Math.
Lett., 3 (1990), pp. 65–69.
ETNA
Kent State University
etna@mcs.kent.edu
Bru, Migallón, Penadés and Szyld
37
[5] Rafael Bru, Violeta Migallón, and José Penadés, Chaotic inner–outer iterative schemes,
in Proc. of the Fifth SIAM Conference on Applied Linear Algebra, J. G. Lewis, ed., SIAM
Press, Philadelphia, 1994, pp. 434–438.
, Chaotic methods for the parallel solution of linear systems, Comput. Systems Engrg.,
[6]
to appear.
[7] Wang Deren, On the convergence of the parallel multisplitting AOR method, Linear Algebra
Appl., 154 (1991), pp. 473–486.
[8] Mouhamed Nabih El Tarazi, Some convergence results for asynchronous algorithms, Numer.
Math., 39 (1982), pp. 325–340.
[9] Andreas Frommer, On asynchronous iterations in partially ordered spaces, Numer. Funct.
Anal. Optim., 12 (1991), pp. 315–325.
[10] Andreas Frommer and Günter Mayer, Convergence of relaxed parallel multisplitting methods, Linear Algebra Appl., 119 (1989), pp. 141–152.
[11]
, On the theory and practice of multisplitting methods in parallel computation, Computing, 49 (1992), pp. 63–74.
[12] Andreas Frommer and Bert Pohl, A comparison result for multisplittings based on overlapping blocks and its application to waveform relaxation methods, Technical Report 93-05,
Eidgenössiche Technische Hochschule, Seminar für Angewandte Mathematik, Zurich, May
1993. Numer. Linear Algebra Appl., 2(4), (1995), to appear.
[13]
, Comparison results for splittings based on overlapping blocks, in Proc. of the Fifth
SIAM Conference on Applied Linear Algebra, J. G. Lewis, ed., SIAM Press, Philadelphia,
1994, pp. 29–33.
[14] Andreas Frommer and Daniel B. Szyld, H -splittings and two-stage iterative methods,
Numer. Math., 63 (1992), pp. 345–356.
[15]
, Asynchronous two-stage iterative methods, Numer. Math., 69 (1994), pp. 141–153.
[16] Robert Fuster, Violeta Migallón, and José Penadés, Parallel chaotic extrapolated likeJacobi methods, Linear Algebra Appl., to appear.
[17] Gene H. Golub and Michael L. Overton, Convergence of a two-stage Richardson iterative
procedure for solving systems of linear equations in Numer. Anal., G.A. Watson, ed., Proc.
of the Ninth Biennial Conference, Dundee, Scotland, 1981, Lecture Notes in Mathematics
912, Springer Verlag, New York, 1982, pp. 128–139.
[18]
, The convergence of inexact Chebyshev and Richardson iterative methods for solving
linear systems, Numer. Math., 53 (1988), pp. 571–593.
[19] Apostolos Hadjidimos and Dimitrios Noustos, On a matrix identity connecting iteration
operators associated with a p-cyclic matrix, Linear Algebra Appl., 182 (1993), pp. 157–177.
[20] Alston S. Householder, The Theory of Matrices in Numerical Analysis, Blaisdell, Waltham,
Mass. 1964. Reprinted by Dover, New York, 1975.
[21] Mark T. Jones and Daniel B. Szyld, Two-stage multisplitting methods with overlapping
blocks, Technical Report 94-31, Department of Mathematics, Temple University, Philadelphia, Pa., March 1994. Also Available as Technical Report CS-94-224, Computer Science
Department, University of Tenessee, Knoxville, March 1994. Numer. Linear Algebra Appl.,
to appear.
[22] Paul J. Lanzkron, Donald J. Rose, and Daniel B. Szyld, Convergence of nested classical
iterative methods for linear systems, Numer. Math., 58 (1991), pp. 685–702.
[23] Ivo Marek and Daniel B. Szyld, Splittings of M -operators: Irreducibility and the index of
the iteration operator, Numer. Funct. Anal. Optim., 11 (1990), pp. 529–553.
[24] Günter Mayer, Comparison theorems for iterative methods based on strong splittings, SIAM
J. Numer. Anal., 24 (1987), pp. 215–227.
[25] Arnold Neumaier, New techniques for the analysis of linear interval equations, Linear Algebra
Appl., 58 (1984), pp. 273–325.
[26] Michael Neumann and Robert J. Plemmons, Convergence of parallel multisplitting iterative
methods for M -matrices, Linear Algebra Appl., 88 (1987), pp. 559–573.
[27] Nancy K. Nichols, On the convergence of two-stage iterative processes for solving linear
equations, SIAM J. Numer. Anal., 10 (1973), pp. 460–469.
[28] Dianne P. O’Leary and Robert E. White,. Multi-splittings of matrices and parallel solution
of linear systems, SIAM J. on Algebraic and Discrete Math., 6 (1985), pp. 630–640.
[29] James M. Ortega and Werner C. Rheinboldt, Iterative Solution of Nonlinear Equations in
Several Variables, Academic Press, New York and London, 1970.
[30] Alexander M. Ostrowski, Über die Determinanten mit überwiegender Hauptdiagonale, Comentarii Mathematici Helvetici, 10 (1937), pp. 69–96.
[31]
, Determinanten mit überwiegender Hauptdiagonale und die absolute Konvergenz von
linearen Iterationsprozessen, Comentarii Mathematici Helvetici, 30 (1956), pp. 175–210.
ETNA
Kent State University
etna@mcs.kent.edu
38
Parallel, synchronous and asynchronous two-stage multisplitting methods
[32] José Penadés, Métodos Iterativos Paralelos para la Resolución de Sistemas Lineales basados
en Multiparticiones, PhD thesis, Departamento de Tecnologı́a Informática y Computación,
Universidad de Alicante, November 1993, in Spanish.
[33] Werner C. Rheinboldt and James S. Vandergraft, A simple approach to the PerronFrobenius theory for positive operators on general partially-ordered finite-dimentional linear spaces, Math. Comp., 27 (1973), pp. 139–145.
[34] F. Robert, M. Charnay, and F. Musy, Itérations chaotiques série-parallèle pour des
équations non-linéaires de point fixe, Aplikace Matematiky, 20 (1975), pp. 1–38.
[35] Hans Schneider, Theorems on M -splittings of a singular M -matrix which depend on graph
structure, Linear Algebra Appl., 58 (1984), pp. 407–424.
[36] Xiaodi Sun, Convergence of parallel multisplitting chaotic iterative methods, Numer. Math. J.
Chinese Univ. (Gao Deng Xue Xiao Ji Suan Shu Xue Xue Bao), 14 (1992), pp. 183–187,
in Chinese.
[37] Daniel B. Szyld, Synchronous and asynchronous two-stage multisplitting methods, in Proc.
of the Fifth SIAM Conference on Applied Linear Algebra, J. G. Lewis, ed., SIAM Press,
Philadelphia, 1994, pp. 39–44.
[38] Daniel B. Szyld and Mark T. Jones, Two-stage and multisplitting methods for the parallel
solution of linear systems, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 671–679.
[39] Richard S. Varga, Factorization and normalized iterative methods, in Boundary Problems
in Differential Equations, Rudolph E. Langer, ed., The University of Wisconsin Press,
Madison, 1960, pp. 121–142..
[40]
, Matrix Iterative Analysis, Prentice-Hall, Englewood Cliffs, New Jersey, 1962.
[41]
, On recurring theorems on diagonal dominance, Linear Algebra Appl., 13 (1976), pp. 1–
9.
[42] Robert E. White, Block Multisplittings and parallel iterative methods, Comput. Methods
Appl. Mech. Engrg., 64 (1987), pp. 567–577.
, Multisplitting with different weighting schemes, SIAM J. Matrix Anal. Appl., 10 (1989),
[43]
pp. 481–493, 1989.
, Multisplitting of a symmetric positive definite matrix, SIAM J. Matrix Anal. Appl., 11
[44]
(1990), pp. 69–82. 1990.
Download