Graph grammar and algorithmic transformations Lecture topics: Exemplary 1D Finite Difference problem Frontal and Multi-frontal Direct Solvers EXEMPLARY SIMPLE 1D NUMERICAL PROBLEM Find temperature distribution Finite difference disretization such that TRIDIAGONAL MATRIX FOR EXEMPLARY 1D PROBLEM FRONTAL SOLVER Frontal matrix Frontal matrix focuses on forward elimination of the first row FRONTAL SOLVER Frontal matrix 2nd row = 2nd row – 1/h2 * 1st row FRONTAL SOLVER Frontal matrix 2nd row = 2nd row – 1/h2 * 1st row FRONTAL SOLVER Frontal matrix The first row has been eliminated Frontral matrix focuses on the elimination of the second row FRONTAL SOLVER Frontal matrix 3rd row = 3nd row – [1/h2] / [-2/h2] * 2nd row FRONTAL SOLVER Frontal matrix 3rd row = 3nd row – [1/h2] / [-2/h2] * 2nd row MULTI-FRONTAL SOLVER ALGORITHM Construct multiple frontal matrices in such a way so they sum up to the full matrix Variables must be split into parts MULTI-FRONTAL SOLVER ALGORITHM First all frontal matrices are constructed MULTI-FRONTAL SOLVER ALGORITHM + First and second frontal matrices are sum up into new 3x3 frontal matrix Its first and second rows are fully assembled MULTI-FRONTAL SOLVER ALGORITHM + First column is eliminated by using first row 2nd row = 2nd row – [1/h2] * 1st row MULTI-FRONTAL SOLVER ALGORITHM + 2nd row = 2nd row – [1/h2] * 1st row MULTI-FRONTAL SOLVER ALGORITHM + + Third and fourth frontal matrices are sum up into new 3x3 frontal matrix Now only the second row is fully assembled MULTI-FRONTAL SOLVER ALGORITHM + Change of the ordering + MULTI-FRONTAL SOLVER ALGORITHM + + Eliminate entries below the diagonal 2nd row = 2nd row – [1/h2] / [-2/h2] * 1st row 3rd row = 3rd row – [1/h2] / [-2/h2] * 1st row MULTI-FRONTAL SOLVER ALGORITHM Why can I substract fully assembled 1st row from not fully assemled rows 2nd and 3rd ? + + Eliminate entries below the diagonal 2nd row = 2nd row – [1/h2] / [-2/h2] * 1st row 3rd row = 3rd row – [1/h2] / [-2/h2] * 1st row MULTI-FRONTAL SOLVER ALGORITHM This is because substraction and addition are interchangable (now I am substructing from the 2nd not fully assembled row, and later I will add the remaining part) Moreover the 1st row which is utilized for the substractions in other columns contains only zeros + + Eliminate entries below the diagonal 2nd row = 2nd row – [1/h2] / [-2/h2] * 1st row 3rd row = 3rd row – [1/h2] / [-2/h2] * 1st row MULTI-FRONTAL SOLVER ALGORITHM + + Eliminate entries below the diagonal 2nd row = 2nd row – [1/h2] / [-2/h2] * 1st row 3rd row = 3rd row – [1/h2] / [-2/h2] * 1st row MULTI-FRONTAL SOLVER ALGORITHM + + + Fifth and sixth frontal matrices are sum up into new 3x3 frontal matrix Now the second and third rows are fully assembled … PARALLEL MULTI-FRONTAL SOLVER ALGORITHM Procesor 1 Procesor 2 Procesor 3 Procesor 4 All frontal matrices are generated at the same time Procesor 5 Procesor 6 PARALLEL MULTI-FRONTAL SOLVER ALGORITHM + Procesor 1 Procesor 2 + Procesor 3 + Procesor 4 Procesor 5 Summing up and elimination are executed at the same time over different pairs of frontal matrices Procesor 6 PARALLEL MULTI-FRONTAL SOLVER ALGORITHM + Procesor 1 Procesor 2 + Procesor 3 + Procesor 4 Procesor 5 Summing up and elimination are executed at the same time over different pairs of frontal matrices Procesor 6 PARALLEL MULTI-FRONTAL SOLVER ALGORITHM root + node + + Procesor 1 Procesor 2 + Procesor 3 + Procesor 4 Procesor 5 Procesor 6 The agorithm is recursively repeated until we reach the root of the tree The algorithm results in upper trianguler matrix stored in distributed manner Computational complexity = height of the tree = logN (where N = #unknowns-1) SUMMARY (BASIC IDEA OF PARALLEL SOLVER) SUMMARY (BASIC IDEA OF PARALLEL SOLVER) PROCESSOR 0 1 1 2 h u 0 01 2 u 0 h 1 0 1 2nd = 2nd – 1/h2 * 1st 0 u 0 1 1 01 0 2 u1 0 h 2 20 u 2 1 2 h h solve u1 10 1 0 u 0 0 0 1 u 10 1 PROCESSOR 1 1 1 2 h u 20 22 2 u 0 h 1 0 1 2nd = 2nd – 1/h2 * 1st 0 u 0 1 1 22 20 0 2 u1 2 h h send 1 h2 u12 20 h2 send u1 10 1 0 u 2 20 0 1 u 10 1 DIRECT SOLVERS STATE OF THE ART For dense matrices = Gaussian elimination or LU factorization e.g. LAPACK (J. Dongharra) or parallel version PLAPACK (R. van de Geijn) A = L U For sparse matrices (e.g. resulting from Finite Element or Finite Difference Methods) • frontal solvers Irons B., 1970: A frontal solution program for finite-element analysis. International Journal of Numerical Methods in Engineering, 2, 5-32 • multi-frontal solvers Duff I. S., Reid J. K., 1983: The multifrontal solution of indefinite sparse symmetric linear systems. ACM Transactions on Mathematical Software, 9, 302-325 Duff I. S., Reid J. K., 1984: The multifrontal solution of unsymmetric sets of linear systems. SIAM Journal on Scientific and Statistical Computing, 5, 633-641 PARALLEL DIRECT SOLVERS FOR SPARSE SYSTEMS STATE OF THE ART For large sparse matrices: (e.g. resulting from challenging FEM computations) Sub-structing method solvers, based on the domain decomposition paradigm • parallel frontal solvers Scott J. A., 2003: Parallel Frontal Solvers for Large Sparse Linear Systems. ACM Transactions on Mathematical Software, 29, 4, 395-417 Walsh T., Demkowicz L., 1999: A Parallel frontal Solver for hp-Adaptive Finite Elements. TICAM Report 99-01 (sequential frontal solver over each sub-domain, as well as for the interface) • parallel multi-frontal solvers Multi-frontal Massively Parallel sparse direct Solver (MUMPS) Amestoy P. R., Duff I. S., L'Excellent J.-Y., 2000: Multifrontal parallel distributed symmetric and unsymmetric solvers. Computer Methods in Applied Mechanics and Engineering 184, 501-520 Amestoy P. R., Guermouche A., L'Excellent J.-Y., Pralet S., 2006: Hybrid scheduling for the parallel solution of linear systems. Parallel Computing, 32, 2, 136-156 Other parallel multi-frontal solvers: PARDISO, Harwell – Boeing