Numerical Methods I: Linear solvers and least squares Georg Stadler Courant Institute, NYU

advertisement
Numerical Methods I: Linear solvers and least
squares
Georg Stadler
Courant Institute, NYU
stadler@cims.nyu.edu
Sep. 24, 2015
1 / 27
Sources/reading
Main reference:
Sections 8 and 3 in Deuflhard/Hohmann.
2 / 27
Solving linear systems
We study the solution of linear systems of the form
Ax = b
with A ∈ Rn×n , x, b ∈ Rn . We assume that this system has a
unique solution, i.e., A is invertible.
Solving linear systems is needed in many applications. Often, we
have to solve
I
large systems (can be up to millions of unknowns, and more)
I
as fast as possible, and
I
accurately and reliably.
3 / 27
Kinds of linear systems
Solvers such as MATLAB’s \ take advantage of matrix properties:
I
Dense matrix storage: Only entries are stored as 1D array
(column or row wise)
I
Sparse matrix storage: Most aij = 0: only store nonzero
entries; stores indices and value; occur in many applications
4 / 27
Kinds of linear systems
Solvers such as MATLAB’s \ take advantage of matrix properties:
I
Dense matrix storage: Only entries are stored as 1D array
(column or row wise)
I
Sparse matrix storage: Most aij = 0: only store nonzero
entries; stores indices and value; occur in many applications
Matrix application/system solution:
I
Dense: the best way to compute Ax is direct summation
I
Fast algorithms for special matrices for computing Ax, FFT,
FMM, . . .
I
Sparse: Most aij = 0: avoid fill in in factorizations
I
Structured/unstructured: is the sparsity pattern easy to
describe without storing it explicitly?
4 / 27
Kinds of linear systems and solvers
Symmetry, positivity . . .
I
Special factorizations for (skew) symmetric matrices
I
Special factorizations for positive definite matrices (Choleski)
I
Diagonally dominant matrices don’t need pivoting
MATLAB’s \ (i.e., UMFPACK) chooses the optimal algorithms
after studying properties of the matrix (details in the “backslash”
book: Tim Davis: Direct methods for sparse linear systems, SIAM,
2006.)
5 / 27
Kinds of linear systems and solvers
UMFPACK’s decision tree for dense matrices
6 / 27
Kinds of linear systems and solvers
UMFPACK’s decision tree for dense matrices
6 / 27
Kinds of linear systems and solvers
UMFPACK’s decision tree for sparse matrices
7 / 27
Kinds of linear systems and solvers
UMFPACK’s decision tree for sparse matrices
7 / 27
Kinds of linear systems and solvers
Factorization-based/direct solvers (dense/sparse LU, Choleski)
require the matrix
- to fit into memory,
- to be explicitly available (sometimes only a function that
applies the matrix to a vector is available) and to fit in
memory,
+ but compute exact (besides rounding error) solution
Iterative solvers
- find an ε-approximation of the solution,
+ able to solve very large problems,
+ often only require a function that computes Ax for given x
± might be faster or slower than a factorization-based method
8 / 27
Kinds of linear systems and solvers
MATLAB demo
I
What are the different storage formats (sparse/dense)? Is it
always better to use one of them?
I
How long does it take to solve sparse/dense systems?
I
What is fill in and how to avoid it?
9 / 27
Kinds of linear systems and solvers
MATLAB demo
Sparse/sense storage:
A=rand(2,2);
B=sparse(A);
whos
Fill-in:
A=bucky + 4*speye(60);
r = symrcm(A);
spy(A); spy(A(r,r)); spy(chol(A)); spy(chol(A(r,r)));
Which sparse solver?
spparms(’spumoni’,1);
A=gallery(’poisson’,8);
b=randn(64,1);
A\b;
10 / 27
Iterative solution of (symmetric) linear systems
Target problems: very large (n = 105 , 106 , . . .), A is usually sparse
has specific properties.
To solve
Ax = b
we construct a sequence
x1 , x2 , . . .
of iterates that converges as fast as possible to the solution x,
where xk+1 can be computed from {x1 , . . . , xk } with as little cost
as possible (e.g., one matrix-vector multiplication).
11 / 27
Iterative solution of (symmetric) linear systems
Let Q be invertible, then
Ax = b ⇔ Q−1 (b − Ax) = 0
⇔ (I − Q−1 A)x + Q−1 b = x
⇔ Gx + c = x
Theorem: The fixed point method xk+1 = Gxk + c with an
invertible G converges for each starting point xo if and only if
ρ(G) < 1,
where ρ(G) is the largest eigenvalue of G (i.e., the spectral radius).
12 / 27
Iterative solution of (symmetric) linear systems
Choices for Q:
I
Q = I. . . Richardson method
Consider A = L + D + U , where D is diagonal, L and U are lower
and upper triangular with zero diagonal. Then:
I
Q = D . . . Jacobi method
I
Q = D + L . . . Gauss-Seidel method
Convergence depends on properties of A: Richardson converges if
all eigenvalues of A are in (0, 2), Jacobi converges for diagonally
dominant matrices, and Gauss Seidel for spd matrices.
13 / 27
Iterative solution of (symmetric) linear systems
Relaxation methods: Use linear combination between new and
previous iterate:
xk+1 = ω(Gxk + c) + (1 − ω)xk = Gω xk + ωc,
where ω ∈ [0, 1] is a damping parameter. Target is to choose the
best damping parameter such that ρ(Gω ) is as small as possible.
Optimal damping parameters can be computed for Richardson and
Jacobi using the eigenvalues of G (see Deuflhard/Hohmann).
14 / 27
Iterative solution of (symmetric) linear systems
Chebyshev acceleration
So far, the new iterate xk+1 only depended on xk . This can be
improved by using all previous iterates when computing xk+1 .
The resulting schemes are called Chebyshev accelerated methods,
and they usually converge faster than the original iterative schemes.
Chebyshev methods are based on computing linear combinations
y k :=
k
X
vkj xj
j=0
with suitable coefficients vkj such that y 0 , y 1 , . . . converges faster
than x0 , x1 , . . . Computation of coefficient requires knowledge of
the eigenvalues of G.
15 / 27
Iterative solution of (symmetric) linear systems
Krylov methods:
Idea: Build a basis for the Krylov subspace {r 0 , Ar 0 , A2 r 0 . . .}
and reduce residual optimally in that space.
I
spd matrices: Conjugate gradient (CG) method
I
symmetric matrices: Minimal residual method (MINRES)
I
general matrices: Generalized residual method (GMRES),
BiCG, BiCGSTAB
16 / 27
Iterative solution of (symmetric) linear systems
Krylov methods:
Idea: Build a basis for the Krylov subspace {r 0 , Ar 0 , A2 r 0 . . .}
and reduce residual optimally in that space.
I
spd matrices: Conjugate gradient (CG) method
I
symmetric matrices: Minimal residual method (MINRES)
I
general matrices: Generalized residual method (GMRES),
BiCG, BiCGSTAB
Properties:
Do not require eigenvalue estimates; require usually one
matrix-vector multiplication per iteration; convergence depends on
eigenvalue structure of matrix (clustering of eigenvalues aids
convergence). Availability of a good preconditioner is often
important. Some methods require storage of iteration vectors.
16 / 27
Least-squares problems
Given data points/measurements
(ti , bi ),
i = 1, . . . , m
and a model function φ that relates t and b:
b = φ(t; x1 , . . . , xn ),
where x1 , . . . , xn are model function parameters. If the model is
supposed to describe the data, the deviations/errors
∆i = bi − φ(ti , x1 , . . . , xn )
should be small. Thus, to fit the model to the measurements, one
must choose x1 , . . . , xn appropriately.
17 / 27
Least-squares problems
Measuring deviations
Least squares: Find x1 , . . . , xn such that
m
1X 2
∆i → min
2
i=1
From a probabilistic perspective, this corresponds to an underlying
Gaussian noise model.
Weighted least squares: Find x1 , . . . , xn such that
m
1X
2
i=1
∆i
δbi
2
→ min,
where δbi > 0 contain information about how much we trust the
ith data point.
18 / 27
Least-squares problems
Measuring deviations
Alternatives to using squares:
L1 error: Find x1 , . . . , xn such that
m
X
|∆i | → min
i=1
Result can be very different, other statistical interpretation, more
stable with respect to outliers.
L∞ error: Find x1 , . . . , xn such that
max |∆i | → min
1≤i≤m
19 / 27
Linear least-squares
We assume (for now) that the model depends linearly on
x1 , . . . , xn , e.g.:
φ(t; x1 , . . . xn ) = a1 (t)x1 + . . . + an (t)xn
Choosing the least square error, this results in
min kAx − bk,
x
where x = (x1 , . . . , xn )T , b = (b1 , . . . , bm )T , and aij = aj (ti ).
In the following, we study the overdetermined case, i.e., m ≥ n.
20 / 27
Linear least-squares problems–QR factorization
Consider non-square matrices A ∈ Rm×n with m ≥ n and
rank(A) = n. Then the system
Ax = b
does, in general, not have a solution (more equations than
unknowns). We thus instead solve a minimization problem
min kAx − bk2 .
x
The minimum x̄ of this optimization problem is characterized by
the normal equations:
AT Ax̄ = AT b.
21 / 27
Linear least-squares problems–QR factorization
Solving the normal equations
AT Ax̄ = AT b
requires:
I
computing AT A (which is O(mn2 ))
I
condition number of AT A is square of condition number of A;
(problematic for the Choleski factorization)
22 / 27
Linear least-squares problems–QR factorization
Conditioning
Solving the normal equation is equivalent to computing P b, the
orthogonal projection of b onto the subspace V spanned by
columns of A.
Let x be the solution of the least square problem and denote the
residual by r = b − Ax, and
sin(θ) =
krk2
.
kbk2
23 / 27
Linear least-squares problems–QR factorization
Conditioning
The relative condition number κ of x in the Euclidean norm is
bounded by
I
With respect to puerturbations in b:
κ≤
I
κ2 (A)
cos(θ)
With respect to perturbations in A:
κ ≤ κ2 (A) + κ2 (A)2 tan(θ)
Small residual problems cos(θ) ≈ 1, tan(θ) ≈ 0: behavior similar
to linear system.
Large residual problems cos(θ) 1, tan(θ) > 1: behavior
essentially different from linear system.
24 / 27
Linear least-squares problems–QR factorization
One would like to avoid the multiplication AT A and use a suitable
factorization of A that aids in solving the normal equation, the
QR-factorization:
R1
A = QR = Q1 , Q2
= Q1 R1 ,
0
where Q ∈ Rm×m is an orthonormal matrix (QQT = I), and
R ∈ Rm×n consists of an upper triangular matrix and a block of
zeros.
25 / 27
Linear least-squares problems–QR factorization
How can the QR factorization be used to solve the normal
equation?
b − R1 x 2
min kAx − bk2 = min kQT (Ax − b)k2 min k 1
k ,
x
x
x
b2
where
QT b
b
= 1 .
b2
Thus, the least squares solution is x = R−1 b1 and the residual is
kb2 k.
26 / 27
Linear least-squares problems–QR factorization
How can we compute the QR factorization?
Givens rotations
Use sequence of rotations in 2D subspaces:
For m ≈ n: ∼ n2 /2 square roots, and 4/3n3 multiplications
For m n: ∼ nm square roots, and 2mn2 multiplications
Householder reflections
Use sequence of reflections in 2D subspaces
For m ≈ n: 2/3n3 multiplications
For m n: 2mn2 multiplications
27 / 27
Download