Sparse LU Factorization

advertisement

Sparse Matrix Methods

• Day 1: Overview

• Day 2: Direct methods

• Day 3: Iterative methods

• The conjugate gradient algorithm

• Parallel conjugate gradient and graph partitioning

• Preconditioning methods and graph coloring

• Domain decomposition and multigrid

• Krylov subspace methods for other problems

• Complexity of iterative and direct methods

SuperLU-dist:

Iterative refinement to improve solution

Iterate:

• r = b – A*x

• backerr = max i

( r i

/ (|A|*|x| + |b|) i

)

• if backerr < ε or backerr > lasterr/2 then stop iterating

• solve L*U*dx = r

• x = x + dx

• lasterr = backerr

• repeat

Usually 0 – 3 steps are enough

(Matlab demo)

• iterative refinement

Convergence analysis of iterative refinement

Let C = I – A(LU) -1 [ so A = (I – C)·(LU) ] x

1 r

1

= (LU) -1 b dx

1 x

2

= (LU) -1

= x

1

+dx

1 r

1

= (LU) -1 Cb

= (LU) -1 (I + C)b r

2

= b – Ax

1

= (I – A(LU) -1 )b = Cb

= b – Ax

2

= (I – (I – C)·(I + C))b = C 2 b

. . .

In general, r k

= b – Ax k

= C k b

Thus r k

 0 if |largest eigenvalue of C| < 1.

The Landscape of Sparse Ax=b Solvers

Nonsymmetric

Symmetric positive definite

Direct

A = LU

Pivoting

LU

Iterative y’ = Ay

GMRES,

QMR, …

Cholesky Conjugate gradient

More Robust

More General

More Robust

Less Storage

D

Conjugate gradient iteration

x

0

= 0, r

0

= b, p for k = 1, 2, 3, . . .

0

= r

0

α k

= (r T k-1 r k-1

) / (p T k-1

Ap k-1

) step length x k

= x k-1

+ α k r k

= r k-1

– α k p

Ap k-1 approx solution k-1 residual

β k

= (r T k r k p k

= r k

+ β k

) / (r T k-1 p r k-1

) improvement k-1 search direction

• One matrix-vector multiplication per iteration

• Two vector dot products per iteration

• Four n-vectors of working storage

Conjugate gradient: Krylov subspaces

• Eigenvalues:

Au = λu { λ

1

, λ

2

, . . ., λ n

}

• Cayley-Hamilton theorem:

(A – λ

1

I)·(A – λ

2

I) · · · (A – λ n

I) = 0

Therefore

Σ

0  i  n c i

A i = 0 for some c i so

A -1 =

Σ

(–c i

/c

0

) A i–1

1  i  n

• Krylov subspace:

Therefore if

Ax = b

, then x = A -1 b and x  span (b, Ab, A 2 b, . . ., A n-1 b) = K n

(A, b)

Conjugate gradient: Orthogonal sequences

• Krylov subspace:

K i

(A, b) = span (b, Ab, A 2 b, . . ., A i-1 b)

• Conjugate gradient algorithm: for i = 1, 2, 3, . . .

find x i

 K i

(A, b) such that r i

= (Ax i

– b)  K i

(A, b)

• Notice r i

 K i+1

(A, b), so r i

 r j for all j < i

• Similarly, the “directions” are A-orthogonal:

(x i

– x i-1

) T ·A· (x j

– x j-1

) = 0

• The magic: Short recurrences. . .

A is symmetric => can get next residual and direction from the previous one, without saving them all.

Conjugate gradient: Convergence

• In exact arithmetic, CG converges in n steps

(completely unrealistic!!)

• Accuracy after k steps of CG is related to:

• consider polynomials of degree k that are equal to 1 at 0.

• how small can such a polynomial be at all the eigenvalues of A?

• Thus, eigenvalues close together are good.

• Condition number:

κ (A) = ||A||

2

||A -1 ||

2

= λ max

(A) / λ min

(A)

• Residual is reduced by a constant factor by

O(κ 1/2 (A)) iterations of CG.

(Matlab demo)

• CG on grid5(15) and bcsstk08

• n steps of CG on bcsstk08

Conjugate gradient: Parallel implementation

• Lay out matrix and vectors by rows

• Hard part is matrix-vector product y = A*x

• Algorithm

Each processor j:

Broadcast x(j)

Compute y(j) = A(j,:)*x

• May send more of x than needed

• Partition / reorder matrix to reduce communication y

P0 P1 P2 P3 x

P0

P1

P2

P3

(Matlab demo)

• 2-way partition of eppstein mesh

• 8-way dice of eppstein mesh

Preconditioners

• Suppose you had a matrix B such that:

1.

condition number

κ

(B -1 A) is small

2.

By = z is easy to solve

• Then you could solve (B -1 A)x = B -1 b instead of Ax = b

• B = A is great for (1), not for (2)

• B = I is great for (2), not for (1)

• Domain-specific approximations sometimes work

• B = diagonal of A sometimes works

• Or, bring back the direct methods technology. . .

(Matlab demo)

• bcsstk08 with diagonal precond

Incomplete Cholesky factorization (IC, ILU) x

A R T R

• Compute factors of A by Gaussian elimination, but ignore fill

• Preconditioner B = R T R

A, not formed explicitly

• Compute B -1 z by triangular solves (in time nnz(A))

• Total storage is O(nnz(A)), static data structure

• Either symmetric (IC) or nonsymmetric (ILU)

(Matlab demo)

• bcsstk08 with ic precond

Incomplete Cholesky and ILU: Variants

• Allow one or more “levels of fill”

• unpredictable storage requirements

1

2

4

3

• Allow fill whose magnitude exceeds a “drop tolerance”

• may get better approximate factors than levels of fill

• unpredictable storage requirements

• choice of tolerance is ad hoc

• Partial pivoting (for nonsymmetric A)

• “Modified ILU” (MIC): Add dropped fill to diagonal of U or R

• A and R T R have same row sums

• good in some PDE contexts

1

2

4

3

Incomplete Cholesky and ILU: Issues

• Choice of parameters

• good: smooth transition from iterative to direct methods

• bad: very ad hoc, problem-dependent

• tradeoff: time per iteration (more fill => more time) vs # of iterations (more fill => fewer iters)

• Effectiveness

• condition number usually improves (only) by constant factor

(except MIC for some problems from PDEs)

• still, often good when tuned for a particular class of problems

• Parallelism

• Triangular solves are not very parallel

• Reordering for parallel triangular solve by graph coloring

(Matlab demo)

• 2-coloring of grid5(15)

Sparse approximate inverses

A B -1

• Compute B -1  A explicitly

• Minimize || B -1 A – I ||

F

(in parallel, by columns)

• Variants: factored form of B -1

, more fill, . .

• Good: very parallel

• Bad: effectiveness varies widely

Support graph preconditioners: example

[Vaidya]

G(A) G(B)

• A is symmetric positive definite with negative off-diagonal nzs

• B is a maximum-weight spanning tree for A

(with diagonal modified to preserve row sums)

• factor B in O(n) space and O(n) time

• applying the preconditioner costs O(n) time per iteration

Support graph preconditioners: example

G(A) G(B)

• support each edge of A by a path in B

• dilation( A edge ) = length of supporting path in B

• congestion( B edge ) = # of supported A edges

• p = max congestion, q = max dilation

• condition number

κ

(B -1 A) bounded by p·q (at most O(n 2 ))

Support graph preconditioners: example

G(A) G(B)

• can improve congestion and dilation by adding a few strategically chosen edges to B

• cost of factor+solve is O(n 1.75

), or O(n 1.2

) if A is planar

• in recent experiments [Chen & Toledo], often better than drop-tolerance MIC for 2D problems, but not for 3D.

Domain decomposition (introduction)

B 0 E

A =

0 C F

E T F T G

• Partition the problem (e.g. the mesh) into subdomains

• Use solvers for the subdomains B -1 and C -1 to precondition an iterative solver on the interface

• Interface system is the Schur complement:

S = G – E T B -1 E – F T C -1 F

• Parallelizes naturally by subdomains

(Matlab demo)

• grid and matrix structure for overlapping 2-way partition of eppstein

Multigrid

(introduction)

• For a PDE on a fine mesh, precondition using a solution on a coarser mesh

• Use idea recursively on hierarchy of meshes

• Solves the model problem (Poisson’s eqn) in linear time!

• Often useful when hierarchy of meshes can be built

• Hard to parallelize coarse meshes well

• This is just the intuition – lots of theory and technology

Other Krylov subspace methods

• Nonsymmetric linear systems:

• GMRES: for i = 1, 2, 3, . . .

find x i

 K i

(A, b) such that r i

= (Ax i

– b)  K i

(A, b)

But, no short recurrence => save old vectors => lots more space

• BiCGStab, QMR, etc.:

Two spaces K i

(A, b) and K i

(A T , b) w/ mutually orthogonal bases

Short recurrences => O(n) space, but less robust

• Convergence and preconditioning more delicate than CG

• Active area of current research

• Eigenvalues: Lanczos (symmetric), Arnoldi (nonsymmetric)

The Landscape of Sparse Ax=b Solvers

Nonsymmetric

Symmetric positive definite

Direct

A = LU

Pivoting

LU

Iterative y’ = Ay

GMRES,

QMR, …

Cholesky Conjugate gradient

More Robust

More General

More Robust

Less Storage

Complexity of direct methods

Time and space to solve any problem on any wellshaped finite element mesh n 1/2 n 1/3

Space (fill):

Time (flops):

2D

O(n log n)

O(n 3/2 )

3D

O(n 4/3 )

O(n 2 )

Complexity of linear solvers

Time to solve model problem

(Poisson’s equation) on regular mesh n 1/2

Sparse Cholesky:

CG, exact arithmetic:

CG, no precond:

CG, modified IC:

CG, support trees:

Multigrid:

2D

O(n 1.5 )

O(n 2 )

O(n 1.5 )

O(n 1.25 )

O(n 1.20 )

O(n) n 1/3

3D

O(n 2 )

O(n 2 )

O(n 1.33 )

O(n 1.17 )

O(n 1.75 )

O(n)

Download