ETNA

ETNA Electronic Transactions on Numerical Analysis. Volume 3, pp. 24-38, March 1995. Copyright  1995, Kent State University. ISSN 1068-9613. Kent State University etna@mcs.kent.edu PARALLEL, SYNCHRONOUS AND ASYNCHRONOUS TWO-STAGE MULTISPLITTING METHODS ∗ RAFAEL BRU† , VIOLETA MIGALLÓN‡ , JOSÉ PENADÉS† , AND DANIEL B. SZYLD§ Abstract. Different types of synchronous and asynchronous two-stage multisplitting algorithms for the solution of linear systems are analyzed. The different algorithms which appeared in the literature are reviewed, and new ones are presented. Convergence properties of these algorithms are studied when the matrix in question is either monotone or an H-matrix. Relaxed versions of these algorithms are also studied. Computational experiments on a shared memory multiprocessor vector computer are presented. Key words. asynchronous methods, two-stage iterative methods, linear systems, multisplittings, parallel algorithms. AMS subject classifications. 65F10, 65F15. 1. Introduction. In this paper we present various types of synchronous and asynchronous two-stage multisplitting algorithms for the solution of linear systems of the form (1.1) Ax = b, where A is an n × n nonsingular matrix. We first give a general convergence result, and then, for the most part, concentrate our study on two important cases: when A is monotone, i.e., when A−1 ≥ O [1], and when A is an H-matrix [30], [41]. The multisplitting algorithm was introduced by O’Leary and White [28] and was further studied by many authors; see, e.g., Frommer and Mayer [10], [11], Neumann and Plemmons [26], or White [42], [44]. One can think of the multisplitting algorithm as an extension and parallel generalization of the classical Block Jacobi algorithm, see, e.g., Varga [40], which we review in what follows. Let the matrix A is partitioned into L × L blocks   A11 A12 · · · A1L  A21 A22 · · · A2L    (1.2)  .. .. ..  ,  . . .  AL1 · · · ALL AL2 with the diagonal blocks A`` being square nonsingular of order n` , ` = 1, . . . , L, L X n` = n, and the vectors x and b are partitioned conformally. `=1 ∗ Received October 3, 1994. Accepted February 27, 1995. Communicated by Richard Varga. Departament de Matemàtica Aplicada, Universitat Politècnica de València, E-46071 València, Spain (rbru@mat.upv.es). This research was partially supported by both Spanish CICYT grant number TIC91-1157-C03-01 and the ESPRIT III basic research programme of the EC under contract No. 9072 (project GEPPCOM). ‡ Departament de Tecnologia Informàtica i Computació, Universitat d’Alacant, E-03080 Alacant, Spain (violeta@dtic.ua.es). (jpenades@dtic.ua.es). This research was supported by Spanish CICYT grant number TIC91-1157-C03-01. § Department of Mathematics, Temple University, Philadelphia, Pennsylvania 19122-2585, USA (szyld@euclid.math.temple.edu). This research was supported by the National Science Foundation grant DMS-9201728. † 24 ETNA Kent State University etna@mcs.kent.edu 25 Bru, Migallón, Penadés and Szyld (1) (L) Algorithm 1. (Block Jacobi). Given the initial vector xT0 = [(x0 )T , . . . , (x0 )T ]. (1.3) For i = 1, 2, . . . , until convergence. For ` = 1 to L L X (`) (k) A`` xi = b(`) − A`k xi−1 . k=1,k6=` Each system (1.3) can be solved in parallel by a different processor of a parallel computer, and the vector iterate at each step is (1) (2) (L) xTi = [(xi )T , (xi )T , . . . , (xi )T ]. The multisplitting algorithm [28] consists of having a collection of splittings A = M ` − N` , (1.4) ` = 1, . . . , L, and diagonal nonnegative weighting matrices E` which add to the identity, and performing the following algorithm. Algorithm 2. (Multisplitting). Given the initial vector x0 . For i = 1, 2, . . . , until convergence. For ` = 1 to L M` y` = N` xi−1 + b (1.5) xi = (1.6) L X E` y` . `=1 As it can be appreciated, Algorithm 1 can be seen as a special case of Algorithm 2 when all splittings are the same, namely, M` = Diag (A11 , . . . , ALL ), the blockdiagonal matrix consisting of the diagonal blocks of A, and the diagonal weighting matrices E` , have ones in the entries corresponding to the diagonal block A`` and zero otherwise. In this case we say that the matrices E` form a partition of the identity. A rendition of Algorithm 1 can also be obtained from Algorithm 2 by setting (1.7) (1.8) M` E` = = Diag (I, . . . , I, A`` , I, . . . , I), Diag (O, . . . , O, I, O, . . . , O), and for ` = 1, . . . , L, i.e., the same partition of the identity just discussed. Convergence of the multisplitting algorithm was first established for A−1 ≥ O by O’Leary and White [28] when the splittings (1.4) are weak regular, i.e., when M`−1 ≥ O and M`−1 N` ≥ O, where the inequalities are understood component-wise [1], [29], [40]. Comparison of convergence of different splittings (1.4) when they are M -splittings, i.e., when M` is an M -matrix (defined in the next section) and N` ≥ O [23], [35], was studied by Neumann and Plemmons [26]. If the splittings (1.4) correspond to diagonal blocks of A of larger size than the corresponding number of ones in the partition of the identity, i.e, if A`` in (1.7) has order ñ` > n` , the order of I in (1.8), then the multisplitting Algorithm 2 corresponds to an overlapping algorithm, which was shown to have better asymptotic convergence rate than the Block Jacobi algorithm (without overlap); see Frommer and Pohl [12], [13], and also Jones and Szyld [21]. Weighting matrices which are not partitions of the identity, and therefore induce another type ETNA Kent State University etna@mcs.kent.edu 26 Parallel, synchronous and asynchronous two-stage multisplitting methods of overlap, have been also studied; see, e.g., O’Leary and White [28], White [43], or Frommer and Mayer [11]. As is the case for Block Jacobi, the systems (1.5) can be solved by different processors in parallel. Moreover, the components of y` corresponding to zeros in the diagonal of E` need not be computed; see (1.6). When the linear systems (1.5), or (1.3), are not solved exactly, but rather their solutions approximated by iterative methods, we are in the presence of a two-stage method. Two-stage methods, sometimes called inner-outer iterations, were studied, e.g., by Nichols [27], Golub and Overton [17], [18], Lanzkron, Rose and Szyld [22], Frommer and Szyld [14], [15], Bru, Elsner and Neumann [3], and Bru, Migallón and Penadés [6]. When the systems (1.5) are solved iteratively in each processor, using the splittings (1.9) M` = B` − C` , ` = 1, . . . , L, and performing a fixed number s of iterations, one obtains the following algorithm. Algorithm 3. (Two-stage Multisplitting). Given the initial vector x0 , and the fixed number s of inner iterations. For i = 1, 2, . . . , until convergence. For ` = 1 to L y`,0 = xi−1 For j = 1 to s (1.10) B` y`,j = C` y`,j−1 + N` xi−1 + b (1.11) xi = L X E` y`,s . `=1 Convergence of this algorithm for any number of inner iterations s was established for A−1 ≥ O by Szyld and Jones [38] when the outer splittings (1.4) are regular splittings, i.e., when M`−1 ≥ O and N` ≥ O [1], [40], and the inner splittings (1.9) are weak regular splittings. Algorithm 3 reduces to Algorithm 2 when the inner splitting (1.9) is the trivial M` = M` − O, and s = 1. We also note that, under certain circumstances, if the number of inner iterations is large enough, i.e., as s → ∞, the iterates produced by Algorithm 3 resemble those produced by Algorithm 2, i.e., y`,s → y` , as s → ∞, where y` is the solution of (1.5); cf. [14, section 2] and the proof of Theorem 3.1. When the number of inner iterations varies for each splitting and for each outer iteration, i.e., when s = s(`, i) in Algorithm 3, we say that we have a Non-stationary Two-stage Multisplitting Algorithm (Algorithm 4). Model A in Bru, Elsner and Neumann [2] is a special case of this algorithm, when the outer splittings (1.4) are all A = A − O. A relaxation parameter ω > 0 can be introduced and replace the computation of y`,j in (1.10) with the equation B` y`,j = ω(C` y`,j−1 + N` xi−1 + b) + (1 − ω)B` y`,j−1 . 1 1−ω This is equivalent to replacing the splitting (1.9) by M` = B` − B` + C` . ω ω Clearly, with ω = 1, equation (1.10) is recovered. In the case of ω 6= 1 we have a Relaxed Non-stationary Two-stage Multisplitting Algorithm (Algorithm (1.12) ETNA Kent State University etna@mcs.kent.edu Bru, Migallón, Penadés and Szyld 27 5); see also Bru and Fuster [4], Bru, Migallón and Penadés [6], Deren [7], Frommer and Mayer [10], Fuster, Migallón and Penadés [16], and Sun [36] where relaxed (or extrapolated) algorithms are considered. In this paper we study the convergence of the Non-stationary Two-stage Multisplitting Algorithm 4, together with its relaxed version, Algorithm 5, see section 3. We also study their extension to two different asynchronous algorithms, where the (approximate) solutions of the systems (1.5), by repeated solutions of (1.10) or (1.12), proceed in each processor without waiting for the completion of the computation of the iterates in the other processors; see section 4. For each equation (1.12) one can work with a different relaxation parameter ω` , ` = 1, . . . , L, as is done, e.g., in the MSOR method; see, e.g., [19]. We point out that the convergence results in this paper can be readily extended to this multi-parameter case without any difficulty. In the next section we present some notation, definitions and preliminary results which we refer to later, while in section 5 we present some numerical experiments, which illustrate the performance of the algorithms studied. Some of the results in this paper were presented in preliminary form in [5] and [37]. 2. Preliminaries. We say that a vector x is nonnegative (positive), denoted x ≥ 0 (x > 0), if all its entries are nonnegative (positive). Similarly, a matrix B is said to be nonnegative, denoted B ≥ O, if all its entries are nonnegative or, equivalently, if it leaves invariant the set of all nonnegative vectors. We compare two matrices A ≥ B, when A − B ≥ O, and two vectors x ≥ y (x > y) when x − y ≥ 0 (x − y > 0). Given a matrix A = (aij ), we define the matrix |A| = (|aij |). It follows that |A| ≥ O and that |AB| ≤ |A| |B| for any two matrices A and B of compatible size. Let Z n×n denote the set of all real n × n matrices which have all non–positive off–diagonal entries. A nonsingular matrix A ∈ Z n×n is called M -matrix if A−1 ≥ O, i.e., if A is a monotone matrix; see, e.g., Berman and Plemmons [1] or Varga [40]. By ρ(A) we denote the spectral radius of the square matrix A. For any matrix A = (aij ) ∈ IRn×n , we define its comparison matrix hAi = (αij ) by αii = |aii |, αij = −|aij |, i 6= j. Following Ostrowski [30], [31], a nonsingular matrix A is said to be an H-matrix if hAi is an M -matrix. Of course, M -matrices are special cases of H-matrices. H-matrices, which are not necessarily monotone, arise in many applications and were studied by a number of authors in connection with iterative solutions of linear systems; see, e.g., the classical paper by Varga [41], or Frommer and Szyld [14] for an extensive bibliography and some examples. Lemma 2.1. Let A, B ∈ IRn×n . (a) If A is an M -matrix, B ∈ Z n×n , and A ≤ B, then B is an M -matrix. (b) If A is an H-matrix, then |A−1 | ≤ hAi−1 . (c) If |A| ≤ B then ρ(A) ≤ ρ(B). Proof. (a) and (c) can be found, e.g., in [29], 2.4.10 and 2.4.9, respectively. Part (b) goes back to Ostrowski [30]; see also, e.g., Neumaier [25]. Definition 2.2. Let A ∈ IRn×n . The representation A = M − N is called a splitting if M is nonsingular. It is called a convergent splitting if ρ(M −1 N ) < 1. A splitting A = M − N is called (a) regular if M −1 ≥ O and N ≥ O [39], [40], (b) weak regular if M −1 ≥ O and M −1 N ≥ O [1], [29], (c) H-splitting if hM i − |N | is an M -matrix [14], and ETNA Kent State University etna@mcs.kent.edu 28 Parallel, synchronous and asynchronous two-stage multisplitting methods (d) H-compatible splitting if hAi = hM i − |N | [14]. Lemma 2.3. Let A = M − N be a splitting. (a) If the splitting is weak regular, then ρ(M −1 N ) < 1 if and only if A−1 ≥ O. (b) If the splitting is an H-splitting, then A and M are H-matrices and ρ(M −1 N ) ≤ ρ(hM i−1 |N |) < 1. (c) If the splitting is an H-compatible splitting and A is an H-matrix, then it is an H-splitting and thus convergent. Proof. (a) can be found, e.g., in [1], [29], [40]. The first part of (b) was shown in [24], [25]. The second part as well as (c) is found in [14]. Lemma 2.4. [34] Let H1 , H2 , . . . , Hi , . . . be a sequence of nonnegative matrices in IRn×n . If there exist a real number 0 ≤ θ < 1, and a vector v > 0 in IRn , such that Hj v ≤ θv, for all j = 1, 2, . . . , then ρ(Vi ) ≤ θi < 1, where Vi = Hi · · · H2 · H1 , and lim Vi = O. i→∞ 3. Synchronous Iterations. In this section we give convergence results for the Non-stationary Two-stage Multisplitting Algorithm 4 and its relaxed variant Algorithm 5. We first show that if the outer and inner splittings (1.4) and (1.9) are convergent, and if enough inner iterations are performed, then, the algorithms are convergent. Later we show that if certain conditions are imposed on the splittings, the methods converge for any number of inner iterations. Let x? be the solution of (1.1) and let ei = x? − xi be the error at the ith outer iteration of Algorithm 4. Let R` = B`−1 C` . We rewrite (1.11) as s(`,i)−1 L X X j s(`,i) xi = E` [R` xi−1 + R` B`−1 (N` xi−1 + b)]; j=0 `=1 cf. [15], [22]. Thus, (3.1) ei = H(i)ei−1 = H(i)H(i − 1) · · · H(1)e0 , where (3.2) s(`,i)−1 L X X j s(`,i) H(i) = E` [R` + R` B`−1 N` ] j=0 `=1 L X s(`,i) s(`,i) = E` [R` + (I − R` )M`−1 N` ] = I − Q(i)A, `=1 with (3.3) Q(i) = s(`,i)−1 L L X X X s(`,i) E` (I − R` )M`−1 = E` R`j B`−1 ≥ O. `=1 `=1 j=0 Similarly, if ei = x? − xi is the error at the ith outer iteration of Algorithm 5 and (3.4) S` = S` (ω) = (1 − ω)I + ωR` , it follows from (1.12) and (1.11) that ei = T (i)ei−1 = T (i)T (i − 1) · · · T (1)e0 , where (3.5) s(`,i)−1 L X X j s(`,i) T (i) = E` [S` + S` B`−1 N` ], `=1 j=0 ETNA Kent State University etna@mcs.kent.edu 29 Bru, Migallón, Penadés and Szyld i.e., the iteration matrix for Algorithm 5, is similar to (3.2) of Algorithm 4, where R` is replaced by S` . Theorem 3.1. Let A be non-singular. Let the splittings (1.4) be such that kM`−1 N` k∞ < 1, (3.6) ` = 1, . . . , L, and let the splittings (1.9) be convergent. Assume further that lim s(`, i) = ∞, ` = i→∞ 1, . . . , L. Then, the Non-stationary Two-stage Multisplitting Algorithm 4 converges 2 where ρ = to x? with Ax? = b. If in addition, we assume that 0 < ω < 1+ρ max{ρ(B`−1 C` ), 1 ≤ ` ≤ L}, then the theorem holds for the Relaxed Non-stationary Two-stage Multisplitting Algorithm 5. Proof. Consider first ω = 1, i.e., Algorithm 4. Let R` = B`−1 C` . Since ρ(R` ) < 1, lim R`i = O, for ` = 1, . . . , L. Then, given an > 0, there exists an integer s0 such i→∞ that kR`s k∞ ≤ , for all s ≥ s0 , ` = 1, . . . , L. Since lim s(`, i) = ∞, there exists an i0 i→∞ s(`,i) such that kR` k∞ ≤ , for all i ≥ i0 , ` = 1, . . . , L. Let β be a real constant such that kM`−1 N` k∞ ≤ β < 1, for ` = 1, . . . , L. The existence of such β follows from (3.6). Then, for i ≥ i0 h i s(`,i) s(`,i) kH(i)k∞ ≤ max kR` + (I − R` )M`−1 N` k∞ 1≤`≤L h i s(`,i) s(`,i) ≤ max kR` k∞ + 1 + kR` k∞ kM`−1 N` k∞ 1≤`≤L ≤ max + (1 + )kM`−1 N` k∞ 1≤`≤L ≤ + (1 + )β ≡ α . 1−β we have α < 1 and the errors (3.1) convergence to zero. 1+β For the case of Algorithm 5, i.e., for ω 6= 1, observe that it follows from (3.4) that Setting < (3.7) ρ(S` (ω)) < 1, for ` = 1, . . . , L and the proof follows in the same way. The proof of Theorem 3.1 resembles that of [14, Theorem 2.4]. Here we need the additional hypothesis (3.6) since we have L different splittings. We also note that Theorem 3.1 can be proved in the same way if the norm in assumption (3.6) is replaced by any norm such that if for arbitrary matrices U , U` , and weighting matrices E` , L X E` U` , then kU k ≤ max kU` k; see Bru and Fuster [4]. ` = 1, . . . , L, such that U = `=1 1≤`≤L In particular, one can use any weighted max–norm associated with a positive vector; see, e.g., Householder [20], Rheinboldt and Vandergraft [33], or Frommer and Szyld [15], for descriptions and applications of these norms. Theorem 3.2. Let A−1 ≥ O. Let the splittings (1.4) be regular and the splittings (1.9) be weak regular. Then, the Non-stationary Two-stage Multisplitting Algorithm 4 converges to x? with Ax? = b for any initial vector x0 and any sequence of numbers of inner iterations s(`, i) ≥ 1, ` = 1, . . . , L, i = 1, . . . . If in addition, we assume 0 < ω < 1, the theorem holds for the Relaxed Non-stationary Two-stage Multisplitting Algorithm 5. ETNA Kent State University etna@mcs.kent.edu 30 Parallel, synchronous and asynchronous two-stage multisplitting methods Proof. Let 0 < ω ≤ 1. Algorithm 4 corresponds to ω = 1, while Algorithm 5 corresponds to ω < 1. Let ei = x? − xi be the error at the ith outer iteration of either Algorithm 4 or Algorithm 5. We rewrite (3.5) as (3.8) L X E` T` (i), T (i) = `=1 where X s(`,i)−1 s(`,i) (3.9) T` (i) = S` + S`j B`−1 N` ≥ O, j=0 where the inequality follows from the assumptions on the outer and inner splittings. Similarly to (3.2) and (3.3), it follows from (3.9) that T` (i) = I − P` (i)A, (3.10) X s(`,i)−1 where P` (i) = (I − s(`,i) S` )M`−1 I − S` = ωB`−1 M` , and thus, = S`j (I − S` )M`−1 . From (3.4) it follows that j=0 X s(`,i)−1 (3.11) P` (i) = ω S`j B`−1 . j=0 Consider any fixed vector e > 0 (e.g., with all components equal to 1), and v = A−1 e. Since A−1 ≥ O and no row of A−1 can have all null entries, we get v > 0. By the same arguments B`−1 e > 0, ` = 1, . . . , L. We have from (3.10) and (3.11) that X s(`,i)−1 T` (i)v = (I − P` (i)A)v = v − P` (i)e = v − ωB`−1 e − ω S`j B`−1 e. j=1 X s(`,i)−1 We have that S`j B`−1 e ≥ 0. Moreover, since T` (i)v ≥ 0 and v − ωB`−1 e < v, j=1 there exist constants 0 ≤ θ` < 1 such that v − ωB`−1 e ≤ θ` v. Thus, T` (i)v ≤ θ` v for all ` = 1, . . . , L, i = 1, 2, . . . . Let θ = max{θ` , 1 ≤ ` ≤ L}. From (3.8) we have then that (3.12) T (i)v ≤ θv, for all i = 1, 2, . . . . By Lemma 2.4 this implies that the product V (i) = T (i) · T (i − 1) · · · T (1) tends to zero as i → ∞, and thus lim ei = 0. i→∞ The proof of Theorem 3.2 uses techniques similar to ones used in Theorem 2.1 in [2], in theorems 4.3 and 4.4 in [14] and in Theorem 2.2 in [15]. We point out that the bounds (3.12) are independent of the sequence s(`, i) ≥ 1, ` = 1, . . . , L, i = 1, 2, . . . . Theorem 3.3. Let A be an H-matrix. Let the splittings (1.4) and (1.9) be Hcompatible splittings. Then, the Non-stationary Two-stage Multisplitting Algorithm 4 converges to x? with Ax? = b for any initial vector x0 and any sequence of numbers ETNA Kent State University etna@mcs.kent.edu Bru, Migallón, Penadés and Szyld 31 of inner iterations s(`, i) ≥ 1, ` = 1, . . . , L, i = 1, . . . . If in addition, we assume 0 < ω < 1, the theorem holds for the Relaxed Non-stationary Two-stage Multisplitting Algorithm 5. Proof. Let 0 < ω ≤ 1. Algorithm 4 corresponds to ω = 1, while Algorithm 5 corresponds to ω < 1. Let ei = x? − xi be the error at the ith outer iteration of either Algorithm 4 or Algorithm 5. From (3.4), using Lemma 2.1 (b) we obtain the following bound |S` | ≤ (1 − ω)I + ω|B`−1 ||C` | ≤ (1 − ω)I + ωhB` i−1 |C` | ≡ Ŝ` . Thus, from (3.5), and again using Lemma 2.1 (b), we obtain (3.13) |T (i)| ≤ s(`,i)−1 L X X j s(`,i) Ŝ` hB` i−1 |N` |] ≡ T̂ (i). E` [Ŝ` + j=0 `=1 The matrix T̂ (i) is the iteration matrix corresponding to the ith iteration of a Relaxed Non-stationary Two-stage Multisplitting Algorithm for the monotone matrix hAi = hM` i − |N` | with the regular splittings hM` i − |N` | and hM` i = hB` i − |C` |. These matrices and splittings satisfy the hypothesis of Theorem 3.2 and we have, as in (3.12), (3.14) |T (i)|v ≤ T̂ (i)v ≤ θv, for all i = 1, . . . for some v ∈ IRn , v > 0 and θ ∈ [0, 1). Let V (i) = T (i) · T (i − 1) · · · T (1). We can bound |V (i)| ≤ |T (i)| · |T (i − 1)| · · · |T (1)|. Therefore by (3.14) and Lemma 2.4, V (i) tends to zero as i → ∞, implying lim ei = 0. i→∞ The proof of Theorem 3.3 follows from that of Theorem 3.2 in a way similar to the way that [15, Theorem 2.3] follows from [15, Theorem 2.2]. Here we need the additional hypothesis that the outer splittings be H-compatible, so that hAi = hM` i − |N` | for all ` = 1, . . . , L. 4. Asynchronous Iterations. All methods described in section 1 and studied in section 3 are synchronous in the sense that step (1.11) is performed only after all approximations to the solutions of (1.5) are completed (` = 1, . . . , L). Alternatively, if (`) the weighting matrices form a partition of the identity, each part of xi , say xi = E` xi , can be updated as soon as the approximation to the solution of the corresponding system (1.5) is completed, without waiting for the other parts of xi to be updated. Thus, the previous iterate xi−1 is no longer available for the computation of (1.5) or (1.10). Instead, parts of the current iterate are updated using a vector composed of parts of different previous, not necessarily the latest, iterates; cf. [2, Model B], [15, section 3], [16, section 4]. As is customary in the description and analysis of asynchronous algorithms, the iteration subscript is increased every time any part of the iteration vector is computed; see, e.g., the references in [2], [6], [8], [9], [15], [16]. In a formal way, the sets Ji ⊆ {1, 2, . . . , L}, i = 1, 2, . . . , are defined by ` ∈ Ji if the `th part of the iteration vector is computed at the ith step. The subscripts r(k, i) are used to denote the iteration number of the kth part being used in the computation of any part in the ith iteration, i.e., the iteration number of the kth part available at the beginning of the computation (`) of xi , if ` ∈ Ji . Any arbitrary n × n matrix U can be decomposed into L n × n matrices U (`) , L X U (`) u, where the nonzeros in U (`) u ` = 1, . . . , L, so that for any vector u, U u = `=1 ETNA Kent State University etna@mcs.kent.edu 32 Parallel, synchronous and asynchronous two-stage multisplitting methods correspond to the nonzero elements in E` which form a partition of the identity, i.e., U (`) = E` U . With this notation, we write the following algorithm. Algorithm 6. (Outer Asynchronous Two-stage Multisplitting). (1) (L) Given the initial vector x0 = x0 + · · · + x0 For i = 1, 2, . . . ( (`) xi−1 if ` 6∈ Ji (`) (4.1) xi = (1) (L) (`) (`) H (i) xr(1,i) + · · · + xr(L,i) + Q (i)b if ` ∈ Ji . with H(i) and Q(i) as defined in (3.2) and (3.3), respectively. For easy comparison with Algorithm 8, we rewrite (4.1) explicitly as  (`)  if ` 6∈ Ji   xi−1 s(`,i)−1 L (`) X X (4.2) xi = s(`,i) (`) (k)  E [R xr(`,i) + R`j B`−1 (N` xr(k,i) + b)] if ` ∈ Ji .   ` ` j=0 k=1 We always assume that the asynchronous iterations satisfy the following conditions. They are very natural in asynchronous computations; see, e.g., [9], [15]. (4.3) (4.4) r(`, i) < i for all ` = 1, . . . , L, i = 1, 2, . . . . lim r(`, i) = ∞ for all ` = 1, . . . , L. (4.5) The set {i | ` ∈ Ji } is unbounded for all ` = 1, . . . , L. i→∞ Condition (4.3) simply states that only components previously computed are used, and not future ones. Condition (4.5) is equivalent to what Bru, Elsner and Neumann [2] and others, e.g., Bru, Migallón and Penadés [6], call an admissible sequence, i.e., implying that every component is updated infinitely often. These authors also use the concept of a regulated sequence, i.e., implying that the number of iterations between two updates of the same component is uniformly bounded. Our condition (4.4) is slightly more general since no such bound is assumed. If in (4.1), i.e., in Algorithm 6, we replace H(i) by T (i) as defined in (3.5), L X using S` (ω) as defined in (3.4), ω > 0, and if we replace Q(i) by P (i) = E` P` (i), `=1 where P` (i) are as in (3.11), we obtain a Relaxed Outer Asynchronous Twostage Multisplitting Algorithm (Algorithm 7). This algorithm, which is the asynchronous version of Algorithm 5, can also be obtained by replacing R` by S` (ω) in (4.2). Theorem 4.1. Let A−1 ≥ O. Let the splittings (1.4) be regular and the splittings (1.9) be weak regular. Assume that the sequence r(`, i) and the sets Ji , ` = 1, . . . , L, i = 1, 2, . . . , satisfy conditions (4.3)–(4.5). Then, the Outer Asynchronous Twostage Multisplitting Algorithm 6 converges to x? with Ax? = b for any initial vector x0 and for any sequence of numbers of inner iterations s(`, i) ≥ 1, ` = 1, . . . , L, i = 1, 2, . . . . If in addition, we assume 0 < ω < 1, the theorem holds for the Relaxed Outer Asynchronous Two-stage Multisplitting Algorithm 7. Proof. The proof follows in the same way as the proof of [15, Theorem 3.3] which in turn is based on [8, Theorem 3.4]. Theorem 4.2. Let A be an H-matrix. Let the splittings (1.4) and (1.9) be Hcompatible splittings. Assume that the sequence r(`, i) and the sets Ji , ` = 1, . . . , L, ETNA Kent State University etna@mcs.kent.edu Bru, Migallón, Penadés and Szyld 33 i = 1, 2, . . . , satisfy conditions (4.3)–(4.5). Then, the Outer Asynchronous Twostage Multisplitting Algorithm 6 converges to x? with Ax? = b for any initial vector x0 and for any sequence of numbers of inner iterations s(`, i) ≥ 1, ` = 1, . . . , L, i = 1, 2, . . . . If in addition, we assume 0 < ω < 1, the theorem holds for the Relaxed Outer Asynchronous Two-stage Multisplitting Algorithm 7. Proof. The proof follows in the same way as the proof of [15, Theorem 3.4]. We consider now asynchronous two-stage multisplittings algorithm where, at each inner iteration, the most recent information from the other parts of the iterate is used. (k) In other words, the parts xr(k,i) in the sum in (4.2) may differ for different values of j, j = 0, . . . , s(`, i) − 1 (` ∈ Ji ). To reflect this, we therefore use indices of the (1) form r(k, j, i). Thus, for example, if the subvector xi = E1 xi is being computed by one processor, one component at a time, the computed components can be read by the other processors and used in the computation of the other subvectors x(`) , (1) (1) before all components of xi are computed, and xr(1,j,i) may change for different j, (1) j = 0, . . . , s(`, i)−1 (` ∈ Ji ) while xi is being computed. These algorithms are called totally asynchronous to distinguish them from the outer asynchronous ones; see [15, section 4] for a discussion of possible advantages of these methods. Algorithm 8. (Totally Asynchronous Two-stage Multisplitting). (1) (L) Given the initial vector x0 = x0 + · · · + x0 For i = 1, 2, . . .  (`)  if ` 6∈ Ji   xi−1 s(`,i)−1 L (`) X X (k) (4.6) xi = s(`,i) (`)  E [R xr(`,0,i) + R`j B`−1 (N` xr(k,j,i) + b)] if ` ∈ Ji .   ` ` j=0 k=1 Analogous to (4.3)–(4.4) we now assume ( r(k, j, i) < i, for all k = 1, . . . , L, j = 0, . . . , s(k, i) − 1, i = 1, 2, . . . , (4.7) lim min r(k, j, i) = ∞, for all k = 1, . . . , L. i→∞ j=0,...,s(k,i)−1 Again, if R` is replaced by S` (ω) in (4.6) we obtain a Relaxed Totally Asynchronous Two-stage Multisplitting Algorithm (Algorithm 9). Theorem 4.3. Let A−1 ≥ O. Let the splittings (1.4) be regular and the splittings (1.9) be weak regular. Assume that the numbers r(k, j, i) and the sets Ji , k = 1, . . . , L, i = 1, 2, . . . , satisfy conditions (4.5) and (4.7). Then, the Totally Asynchronous Twostage Multisplitting Algorithm 8 converges to x? with Ax? = b for any initial vector x0 and for any sequence of numbers of inner iterations s(`, i) ≥ 1, ` = 1, . . . , L, i = 1, 2, . . . . If in addition, we assume 0 < ω < 1, the theorem holds for the Relaxed Totally Asynchronous Two-stage Multisplitting Algorithm 9. Proof. The proof follows in the same way as the proof of [15, Theorem 4.3] which in turn is based on [9, Theorem 2.1]. Theorem 4.4. Let A be an H-matrix. Let the splittings (1.4) and (1.9) be Hcompatible splittings. Assume that the numbers r(k, j, i) and the sets Ji , k = 1, . . . , L, i = 1, 2, . . . , satisfy conditions (4.5) and (4.7). Then, the Totally Asynchronous Twostage Multisplitting Algorithm 8 converges to x? with Ax? = b for any initial vector x0 and for any sequence of numbers of inner iterations s(`, i) ≥ 1, ` = 1, . . . , L, i = 1, 2, . . . . If in addition, we assume 0 < ω < 1, the theorem holds for the Relaxed Totally Asynchronous Two-stage Multisplitting Algorithm 9. Proof. The proof follows in the same way as the proof of [15, Theorem 4.4]. ETNA Kent State University etna@mcs.kent.edu 34 Parallel, synchronous and asynchronous two-stage multisplitting methods 5. Numerical Experiments. We implemented the synchronous and outer asynchronous algorithms described in this paper on a shared memory computer Alliant FX/80 with 8 vector processors. We considered the problem of Laplace’s equation satisfying Dirichlet boundary conditions in the rectangle Ω = [0, a] × [0, b]. We discretized, using five point finite differences, the domain Ω with J × K points equally spaced by h. This discretization yields the linear system (1.1), where A is block tridiagonal, A = tridiag [−I, B, −I], where I and B are K × K matrices, I is the identity, and the matrix B = tridiag [−1, 4, −1]. Thus, A has J × J blocks of size K × K. It follows directly that A is an M -matrix, and thus also an H-matrix. In order to fully utilize the 8 processors, we let L = 8. The matrix A can be written as @@ −I @@ −I @ @@D @−I · · −I · · D1 2 A= ·· · ·· −I , −I @ @@D @ 8 where D` = tridiag [−I, B, −I] is of order n` , ` = 1, . . . , 8; cf. (1.2). Consider an (outer) Block Jacobi Algorithm, and thus we have the matrices M` = Diag [D1 , D2 , . . . , D8 ], ` = 1, . . . , 8, and N` such that A = M` − N` . The matrices E` , ` = 1, . . . , 8, are partitioned according to the structure of the matrices M` , with the `th block equal to the identity block and 0’s elsewhere, i.e., they form a partition of the identity; see the discussion after Algorithm 2. Thus, when we perform s(`, i) steps of an iterative method to solve the linear system M` y = N` xi−1 + b, we only need to update the `th block of y; see (1.5) and (1.6). In the experiments reported here, s(`, i) steps of the SOR method are performed to approximate the linear systems (1.5) . Experiments were performed with matrices of different orders. The results are similar for all matrices; see [32]; and therefore we only present here results for n = 5632. The matrix A has J = 11 diagonal blocks B of size K = 512, and we chose n` = 1024, for ` = 1, 2, 3, and n` = 512, for ` = 4, . . . , 8, to study the effect of different number of nonzeros in E` . With this choice, we can obtain a good load balance by having processors 4–8 perform twice as many (inner) iterations as processors 1–3. Recall that in this example the number of nonzeros is proportional to the order of the blocks. This example is similar to the situation in which the diagonal blocks correspond to natural physical structures of different size, or to a discretization of varying resolution. We report the CP U time in seconds (for the concurrent compilation) as function of different relaxation parameters ω > 0 for selected values of the non-stationary parameters s(`, i). In Figure 5.1 the results for the synchronous algorithms are presented. We use the notation 23 45 to represent the non-stationary parameters s(`, i) = 2 for ` = 1, 2, 3 and s(`, i) = 4 for ` = 4, . . . , 8, for all i = 1, . . . , i.e., to represent that each of the first three processors updates its vector twice and each of the last five processors updates its vector four times, in any outer iteration. Similar notation is used for other values of the non-stationary parameters. In Figure 5.1 it can be observed that the CP U time for the algorithm with one inner iteration (s(`, i) = 1, or 18 ) is larger than that for the other cases, both in the stationary case with more inner iterations ETNA Kent State University etna@mcs.kent.edu 35 Bru, Migallón, Penadés and Szyld 80 Synchronous non-stationary 23 45 Synchronous non-stationary 13 25 Synchronous stationary 28 Synchronous stationary 18 70 60 50 Time 40 30 20 10 0.4 0.6 0.8 1 1.2 Relaxation parameter 1.4 1.6 Fig. 5.1. Synchronous algorithms, n = 5632. or in the non-stationary cases. We point out that in the proofs of theorems 3.2–4.4, the restriction ω ≤ 1 is needed so that (3.7) holds. Nevertheless, as can be seen in the experiments reported here, the conclusions of the theorems hold in these examples for certain ω > 1. 55 Asynchronous non-stationary 2 3 45 Asynchronous non-stationary 13 25 Asynchronous stationary 28 Asynchronous stationary 18 50 45 40 Time 35 30 25 20 15 10 5 0.4 0.6 0.8 1 1.2 1.4 Relaxation parameter 1.6 1.8 Fig. 5.2. Asynchronous algorithms, n = 5632. In Figure 5.2 the CP U time of the corresponding outer asynchronous algorithms are presented. In this case, non-stationary algorithms behave better than the stationary ones. Observe that when ω is optimal, all versions give similar times, but when ω is less than the optimal value, in particular when ω = 1, the non-stationary ETNA Kent State University etna@mcs.kent.edu 36 Parallel, synchronous and asynchronous two-stage multisplitting methods schemes produce faster convergence. Finally, we note that, with the choice of the orders n` used to achieve good load balancing, the asynchronous algorithms are better than the synchronous algorithms. Figure 5.3 illustrates this consideration for the non-stationary parameters 23 45 . 80 Synchronous non-stationary 23 45 Asynchronous non-stationary 23 45 70 60 50 Time 40 30 20 10 0.4 0.6 0.8 1 1.2 1.4 Relaxation parameter 1.6 1.8 Fig. 5.3. Synchronous and asynchronous algorithms, n = 5632. 6. Conclusion. We have shown that, under certain conditions, the synchronous and asynchronous two-stage multisplitting algorithms presented in this paper converge for any number of inner iterations s(`, i). This theory permits the use of different stopping criteria for the inner iterations, e.g., by specifying a certain tolerance for the inner residual; cf. Golub and Overton [17], [18]. The choice of optimal sequences s(`, i) – or simply good ones – is problem dependent, and not very well understood. In our experience, often few inner iterations produce good overall convergence results. Moreover, in multiprocessors, a good choice for this sequence is one which counterweights the work in each processor, producing a good overall load balance. The experiments in section 5 illustrate such choices. Acknowledgments. We acknowledge and appreciate the referees’ questions and recommendations which led to improvements in the paper. REFERENCES [1] Abraham Berman and Robert J. Plemmons, Nonnegative Matrices in the Mathematical Sciences. Academic Press, New York, third edition, 1979. Reprinted by SIAM, Philadelphia, 1994. [2] Rafael Bru, Ludwig Elsner, and Michael Neumann, Models of parallel chaotic iteration methods, Linear Algebra Appl., 103 (1988), pp. 175–192. [3] , Convergence of infinite products of matrices and inner-outer iteration schemes, Electron. Trans. Numer. Anal., 2 (1994), pp. 183–193. [4] Rafael Bru and Robert Fuster, Parallel chaotic extrapolated Jacobi method, Appl. Math. Lett., 3 (1990), pp. 65–69. ETNA Kent State University etna@mcs.kent.edu Bru, Migallón, Penadés and Szyld 37 [5] Rafael Bru, Violeta Migallón, and José Penadés, Chaotic inner–outer iterative schemes, in Proc. of the Fifth SIAM Conference on Applied Linear Algebra, J. G. Lewis, ed., SIAM Press, Philadelphia, 1994, pp. 434–438. , Chaotic methods for the parallel solution of linear systems, Comput. Systems Engrg., [6] to appear. [7] Wang Deren, On the convergence of the parallel multisplitting AOR method, Linear Algebra Appl., 154 (1991), pp. 473–486. [8] Mouhamed Nabih El Tarazi, Some convergence results for asynchronous algorithms, Numer. Math., 39 (1982), pp. 325–340. [9] Andreas Frommer, On asynchronous iterations in partially ordered spaces, Numer. Funct. Anal. Optim., 12 (1991), pp. 315–325. [10] Andreas Frommer and Günter Mayer, Convergence of relaxed parallel multisplitting methods, Linear Algebra Appl., 119 (1989), pp. 141–152. [11] , On the theory and practice of multisplitting methods in parallel computation, Computing, 49 (1992), pp. 63–74. [12] Andreas Frommer and Bert Pohl, A comparison result for multisplittings based on overlapping blocks and its application to waveform relaxation methods, Technical Report 93-05, Eidgenössiche Technische Hochschule, Seminar für Angewandte Mathematik, Zurich, May 1993. Numer. Linear Algebra Appl., 2(4), (1995), to appear. [13] , Comparison results for splittings based on overlapping blocks, in Proc. of the Fifth SIAM Conference on Applied Linear Algebra, J. G. Lewis, ed., SIAM Press, Philadelphia, 1994, pp. 29–33. [14] Andreas Frommer and Daniel B. Szyld, H -splittings and two-stage iterative methods, Numer. Math., 63 (1992), pp. 345–356. [15] , Asynchronous two-stage iterative methods, Numer. Math., 69 (1994), pp. 141–153. [16] Robert Fuster, Violeta Migallón, and José Penadés, Parallel chaotic extrapolated likeJacobi methods, Linear Algebra Appl., to appear. [17] Gene H. Golub and Michael L. Overton, Convergence of a two-stage Richardson iterative procedure for solving systems of linear equations in Numer. Anal., G.A. Watson, ed., Proc. of the Ninth Biennial Conference, Dundee, Scotland, 1981, Lecture Notes in Mathematics 912, Springer Verlag, New York, 1982, pp. 128–139. [18] , The convergence of inexact Chebyshev and Richardson iterative methods for solving linear systems, Numer. Math., 53 (1988), pp. 571–593. [19] Apostolos Hadjidimos and Dimitrios Noustos, On a matrix identity connecting iteration operators associated with a p-cyclic matrix, Linear Algebra Appl., 182 (1993), pp. 157–177. [20] Alston S. Householder, The Theory of Matrices in Numerical Analysis, Blaisdell, Waltham, Mass. 1964. Reprinted by Dover, New York, 1975. [21] Mark T. Jones and Daniel B. Szyld, Two-stage multisplitting methods with overlapping blocks, Technical Report 94-31, Department of Mathematics, Temple University, Philadelphia, Pa., March 1994. Also Available as Technical Report CS-94-224, Computer Science Department, University of Tenessee, Knoxville, March 1994. Numer. Linear Algebra Appl., to appear. [22] Paul J. Lanzkron, Donald J. Rose, and Daniel B. Szyld, Convergence of nested classical iterative methods for linear systems, Numer. Math., 58 (1991), pp. 685–702. [23] Ivo Marek and Daniel B. Szyld, Splittings of M -operators: Irreducibility and the index of the iteration operator, Numer. Funct. Anal. Optim., 11 (1990), pp. 529–553. [24] Günter Mayer, Comparison theorems for iterative methods based on strong splittings, SIAM J. Numer. Anal., 24 (1987), pp. 215–227. [25] Arnold Neumaier, New techniques for the analysis of linear interval equations, Linear Algebra Appl., 58 (1984), pp. 273–325. [26] Michael Neumann and Robert J. Plemmons, Convergence of parallel multisplitting iterative methods for M -matrices, Linear Algebra Appl., 88 (1987), pp. 559–573. [27] Nancy K. Nichols, On the convergence of two-stage iterative processes for solving linear equations, SIAM J. Numer. Anal., 10 (1973), pp. 460–469. [28] Dianne P. O’Leary and Robert E. White,. Multi-splittings of matrices and parallel solution of linear systems, SIAM J. on Algebraic and Discrete Math., 6 (1985), pp. 630–640. [29] James M. Ortega and Werner C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York and London, 1970. [30] Alexander M. Ostrowski, Über die Determinanten mit überwiegender Hauptdiagonale, Comentarii Mathematici Helvetici, 10 (1937), pp. 69–96. [31] , Determinanten mit überwiegender Hauptdiagonale und die absolute Konvergenz von linearen Iterationsprozessen, Comentarii Mathematici Helvetici, 30 (1956), pp. 175–210. ETNA Kent State University etna@mcs.kent.edu 38 Parallel, synchronous and asynchronous two-stage multisplitting methods [32] José Penadés, Métodos Iterativos Paralelos para la Resolución de Sistemas Lineales basados en Multiparticiones, PhD thesis, Departamento de Tecnologı́a Informática y Computación, Universidad de Alicante, November 1993, in Spanish. [33] Werner C. Rheinboldt and James S. Vandergraft, A simple approach to the PerronFrobenius theory for positive operators on general partially-ordered finite-dimentional linear spaces, Math. Comp., 27 (1973), pp. 139–145. [34] F. Robert, M. Charnay, and F. Musy, Itérations chaotiques série-parallèle pour des équations non-linéaires de point fixe, Aplikace Matematiky, 20 (1975), pp. 1–38. [35] Hans Schneider, Theorems on M -splittings of a singular M -matrix which depend on graph structure, Linear Algebra Appl., 58 (1984), pp. 407–424. [36] Xiaodi Sun, Convergence of parallel multisplitting chaotic iterative methods, Numer. Math. J. Chinese Univ. (Gao Deng Xue Xiao Ji Suan Shu Xue Xue Bao), 14 (1992), pp. 183–187, in Chinese. [37] Daniel B. Szyld, Synchronous and asynchronous two-stage multisplitting methods, in Proc. of the Fifth SIAM Conference on Applied Linear Algebra, J. G. Lewis, ed., SIAM Press, Philadelphia, 1994, pp. 39–44. [38] Daniel B. Szyld and Mark T. Jones, Two-stage and multisplitting methods for the parallel solution of linear systems, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 671–679. [39] Richard S. Varga, Factorization and normalized iterative methods, in Boundary Problems in Differential Equations, Rudolph E. Langer, ed., The University of Wisconsin Press, Madison, 1960, pp. 121–142.. [40] , Matrix Iterative Analysis, Prentice-Hall, Englewood Cliffs, New Jersey, 1962. [41] , On recurring theorems on diagonal dominance, Linear Algebra Appl., 13 (1976), pp. 1– 9. [42] Robert E. White, Block Multisplittings and parallel iterative methods, Comput. Methods Appl. Mech. Engrg., 64 (1987), pp. 567–577. , Multisplitting with different weighting schemes, SIAM J. Matrix Anal. Appl., 10 (1989), [43] pp. 481–493, 1989. , Multisplitting of a symmetric positive definite matrix, SIAM J. Matrix Anal. Appl., 11 [44] (1990), pp. 69–82. 1990.

ETNA

Related documents

Products

Support

ETNA

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib