Numerical Methods I: Nonlinear equations and nonlinear least squares Georg Stadler

advertisement
Numerical Methods I: Nonlinear equations and
nonlinear least squares
Georg Stadler
Courant Institute, NYU
stadler@cims.nyu.edu
Oct. 9, Oct 15 2015
1 / 34
Organization
I
Homework 2 due tomorrow morning, please hand in a
printout, if possible.
2 / 34
Organization
I
Homework 2 due tomorrow morning, please hand in a
printout, if possible.
I
Homework 3 to be posted tomorrow.
2 / 34
Organization
I
Homework 2 due tomorrow morning, please hand in a
printout, if possible.
I
Homework 3 to be posted tomorrow.
I
Today: Review of linear least squares; Solving nonlinear
equations; Nonlinear least squares problems
2 / 34
Organization
I
Homework 2 due tomorrow morning, please hand in a
printout, if possible.
I
Homework 3 to be posted tomorrow.
I
Today: Review of linear least squares; Solving nonlinear
equations; Nonlinear least squares problems
I
Questions?
2 / 34
Sources/reading
Main reference:
Section 4 in Deuflhard/Hohmann.
3 / 34
Overview
Review of linear least squares and QR
4 / 34
Review: Least-squares problems
Given data points/measurements
(ti , bi ),
i = 1, . . . , m
and a model function φ that relates t and b:
b = φ(t; x1 , . . . , xn ),
where x1 , . . . , xn are model function parameters. If the model is
supposed to describe the data, the deviations/errors
∆i = bi − φ(ti , x1 , . . . , xn )
should be small. Thus, to fit the model to the measurements, one
must choose x1 , . . . , xn appropriately.
5 / 34
Review: Linear least-squares
We assume (for now) that the model depends linearly on
x1 , . . . , xn , e.g.:
φ(t; x1 , . . . xn ) = a1 (t)x1 + . . . + an (t)xn
Choosing the least square error, this results in
min kAx − bk,
x
where x = (x1 , . . . , xn )T , b = (b1 , . . . , bm )T , and aij = aj (ti ).
In the following, we study the over-determined case, i.e., m ≥ n.
6 / 34
Linear least-squares problems–QR factorization
Consider non-square matrices A ∈ Rm×n with m ≥ n and
rank(A) = n. Then the system
Ax = b
does, in general, not have a solution (more equations than
unknowns). We thus instead solve a minimization problem
min kAx − bk2 .
x
The minimum x̄ of this optimization problem is characterized by
the normal equations:
AT Ax̄ = AT b.
7 / 34
Linear least-squares problems–QR factorization
Solving the normal equations
AT Ax̄ = AT b
requires:
I
computing AT A (which is O(mn2 ))
I
condition number of AT A is square of condition number of A;
(problematic for the Choleski factorization)
8 / 34
Linear least-squares problems–QR factorization
One would like to avoid the multiplication AT A and use a suitable
factorization of A that aids in solving the normal equation, the
QR-factorization:
R1
A = QR = Q1 , Q2
= Q1 R1 ,
0
where Q ∈ Rm×m is an orthonormal matrix (QQT = I), and
R ∈ Rm×n consists of an upper triangular matrix and a block of
zeros.
9 / 34
Linear least-squares problems–QR factorization
How can the QR factorization be used to solve the normal
equation?
b − R1 x 2
min kAx − bk2 = min kQT (Ax − b)k2 min k 1
k ,
x
x
x
b2
where
QT b
b
= 1 .
b2
Thus, the least squares solution is x = R−1 b1 and the residual is
kb2 k.
10 / 34
Linear least-squares problems–QR factorization
How can we compute the QR factorization?
Givens rotations
Use sequence of rotations in 2D subspaces:
For m ≈ n: ∼ n2 /2 square roots, and 4/3n3 multiplications
For m n: ∼ nm square roots, and 2mn2 multiplications
Householder reflections
Use sequence of reflections in 2D subspaces
For m ≈ n: 2/3n3 multiplications
For m n: 2mn2 multiplications
These methods compute an orthonormal basis of the columns of
A. An alternative would be the Gram Schmidt method—however,
Gram Schmidt is unstable and thus sensitive to rounding errors.
11 / 34
Overview
Nonlinear systems
12 / 34
Fixed point ideas
We intend to solve the nonlinear equation
f (x) = 0,
x ∈ R.
Reformulation as fixed point method:
x = Φ(x)
Corresponding iteration: Choose x0 (initialization) and compute
x1 , x2 , . . . from
xk+1 = Φ(xk )
When does this iteration converge?
13 / 34
Fixed point ideas
Example: Solve the nonlinear equation
2x − tan(x) = 0.
Iteration #1: xk+1 = Φ1 (xk ) = 0.5 tan(xk )
Iteration #2: xk+1 = Φ2 (xk ) = arctan(2xk )
Iteration #3: xk+1 = Φ3 (xk ) = xk −
2xk −tan(xk )
1−tan2 (xk )
14 / 34
Convergence of fixed point methods
A mapping Φ : [a, b] → R is called contractive on [a, b] if there is a
0 ≤ Θ < 1 such that
|Φ(x) − Φ(y)| ≤ Θ|x − y| for all x, y ∈ [a, b].
If Φ is continuously differentiable on [a, b], then
|Φ(x) − Φ(y)|
= sup |Φ0 (z)|
|x
−
y|
z∈[a,b]
x,y∈[a,b]
sup
15 / 34
Convergence of fixed point methods
Let Φ : [a, b] → [a, b] be contractive with constant Θ < 1. Then:
I
There exists a unique fixed point x̄ with x̄ = Φ(x̄)
I
For any starting guess x0 in [a, b], the fixed point iteration
converges to x̄ and
|xk+1 − xk | ≤ Θ|xk − xk−1 | (linear convergence)
Θk
|x1 − x0 |.
1−Θ
The second expression allows to estimate the required number of
iterations.
|x̄ − xk | ≤
16 / 34
Newton’s method
In one dimension, solve f (x) = 0:
Start with x0 , and compute x1 , x2 , . . . from
xk+1 = xk −
f (xk )
,
f 0 (xk )
k = 0, 1, . . .
Requires f (xk ) 6= 0 to be well-defined (i.e., tangent has nonzero
slope).
17 / 34
Newton’s method
Let F : Rn → Rn , n ≥ 1 and solve
F (x) = 0.
Taylor expansion about starting point x0 :
F (x) = F (x0 ) + F 0 (x0 )(x − x0 ) + o(|x − x0 |)
for x → x0 .
Hence:
x1 = x0 − F 0 (x0 )−1 F (x0 )
Newton iteration: Start with x0 ∈ Rn , and for k = 0, 1, . . .
compute
F 0 (xk )∆xk = −F (xk ),
xk+1 = xk + ∆xk
Requires that F 0 (xk ) ∈ Rn×n is invertible.
Newton’s method is affine invariant.
18 / 34
Convergence of Newton’s method
Assumptions on F : D ⊂ Rn open and convex, F : D → Rn
continuously differentiable with F 0 (x) is invertible for all x, and
there exists ω ≥ 0 such that
kF 0 (x)−1 (F 0 (x + sv) − F 0 (x))vk ≤ sωkvk2
for all s ∈ [0, 1], x ∈ D, v ∈ Rn with x + v ∈ D.
Assumptions on x∗ and x0 : There exists a solution x∗ ∈ D and a
starting point x0 ∈ D such that
ρ := kx∗ − x0 k ≤
2
and Bρ (x∗ ) ⊂ D
ω
Theorem: Then, the Newton sequence xk stays in Bρ (x∗ ) and
limk→∞ xk = x∗ , and
kxk+1 − x∗ k ≤
ω k
kx − x∗ k2
2
19 / 34
Newton’s method
Monotonicity test
k
F (xk+1 )k ≤ Θ̄k
F (xk )k,
Θ̄ < 1
20 / 34
Newton’s method
Monotonicity test (affine invariant):
kF 0 (xk )−1 F (xk+1 )k ≤ Θ̄kF 0 (xk )−1 F (xk )k,
Θ̄ < 1
20 / 34
Newton’s method
Monotonicity test (affine invariant):
kF 0 (xk )−1 F (xk+1 )k ≤ Θ̄kF 0 (xk )−1 F (xk )k,
Θ̄ < 1
Damping:
xk+1 = xk + λk ∆xk ,
0 < λk ≤ 1
For difficult problems, start with small λk and increase later in the
iteration (close to the solution λk should be 1).
20 / 34
Newton’s method
Monotonicity test (affine invariant):
kF 0 (xk )−1 F (xk+1 )k ≤ Θ̄kF 0 (xk )−1 F (xk )k,
Θ̄ < 1
Damping:
xk+1 = xk + λk ∆xk ,
0 < λk ≤ 1
For difficult problems, start with small λk and increase later in the
iteration (close to the solution λk should be 1).
Approximative Jacobians: Use approximative Jacobians F̃ 0 (xk ),
e.g., computed through finite differences.
20 / 34
Newton’s method
Convergence of Newton’s method
I
The “more nonlinear” a problem, the harder it is to solve.
21 / 34
Newton’s method
Convergence of Newton’s method
I
The “more nonlinear” a problem, the harder it is to solve.
I
Computation of Jacobian can be costly/complicated
21 / 34
Newton’s method
Convergence of Newton’s method
I
The “more nonlinear” a problem, the harder it is to solve.
I
Computation of Jacobian can be costly/complicated
I
There’s no reliable black-box solver for nonlinear problems; at
least for higher-dimensional problems, the structure of the
problem must be taken into account.
21 / 34
Newton’s method
Convergence of Newton’s method
I
The “more nonlinear” a problem, the harder it is to solve.
I
Computation of Jacobian can be costly/complicated
I
There’s no reliable black-box solver for nonlinear problems; at
least for higher-dimensional problems, the structure of the
problem must be taken into account.
I
Sometimes, continuation ideas must be used to find good
initializations: Solve simpler problems first and use solution as
starting point for harder problems.
21 / 34
Newton’s method
Role of initialization
Choice of initialization x0 is critical. Depending on the
initialization, the Newton iteration might
I
not converge (it could “blow up” or “oscillate” between two
points)
I
converge to different solutions
I
fail cause it hits a point where the Jacobian is not invertible
(this cannot happen if the conditions of the convergence
theorem are satisfied)
I
...
22 / 34
Nonlinear versus linear problems
“Classification of mathematical problems as linear and nonlinear is
like classification of the Universe as bananas and non-bananas.”
23 / 34
Nonlinear versus linear problems
“Classification of mathematical problems as linear and nonlinear is
like classification of the Universe as bananas and non-bananas.”
or (according to Stanislav Ulam):
Using a term like nonlinear science is like referring to the bulk of
zoology as the study of non-elephant animals.
23 / 34
Overview
Nonlinear least squares—Gauss-Newton
24 / 34
Nonlinear least-squares problems
Assume a least squares problem, where the parameters x do not
enter linearly into the model. Instead of
min kAx − bk2 ,
x∈Rn
we have with F : D → Rn , D ⊂ Rn :
1
min g(x) := kF (x)k2 ,
2
x∈Rn
where F (x)i = ϕ(ti , x)−bi , 1 ≤ i ≤ m
The (local) minimum x∗ of this optimization problem satisfies:
g 0 (x) = 0,
g 00 (x) is positive definite.
25 / 34
Nonlinear least-squares problems
The derivative of g(·) is
G(x) := g 0 (x) = F 0 (x)F (x)
This is a nonlinear system in x, G : D → Rn . Let’s try to solve it
using Newton’s method:
G0 (xk )∆xk = −G(xk ),
xk+1 = xk + ∆xk
where
G0 (x) = F 0 (x)T F 0 (x) + F 00T (x)F (x).
26 / 34
Nonlinear least-squares problems
F 00 (x) is a tensor. It is often neglected due to the following
reasons:
I
It’s difficult to compute and we can use an approximate
Jacobian in Newton’s method.
I
If the data is compatible with the model, then F (x∗ ) = 0 and
the term involving F 00 (x) drops out. If kF (x∗ )k is small,
neglecting that term might not make the convergence much
slower.
I
We know that g 00 (x∗ ) must be positive. If F 0 (xk ) has full
rank, then F 0 (x)T F 0 (x) is positive and invertible.
27 / 34
Nonlinear least-squares problems—Gauss-Newton
The resulting Newton method for the nonlinear least squares
problem is called Gauss-Newton method: Initialize x0 and for
k = 0, 1, . . . solve
F 0 (xk )T F 0 (xk )∆xk = −F 0 (xk )T F (xk )
xk+1 = xk + ∆xk .
(solve)
(update step)
28 / 34
Nonlinear least-squares problems—Gauss-Newton
The resulting Newton method for the nonlinear least squares
problem is called Gauss-Newton method: Initialize x0 and for
k = 0, 1, . . . solve
F 0 (xk )T F 0 (xk )∆xk = −F 0 (xk )T F (xk )
xk+1 = xk + ∆xk .
(solve)
(update step)
The solve step is the normal equation for the linear least squares
problem
min kF 0 (xk )∆xk + F (xk )k.
∆x
28 / 34
Convergence of Gauss-Newton method
Assumptions on F : D ⊂ Rn open and convex, F : D → Rm ,
m ≥ n continuously differentiable with F 0 (x) has full rank for all
x, and let ω ≥ 0, 0 ≤ κ∗ < 1 such that
kF 0 (x)+ (F 0 (x + sv) − F 0 (x))vk ≤ sωkvk2
for all s ∈ [0, 1], x ∈ D, v ∈ Rn with x + v ∈ D.
Assumptions on x∗ and x0 : Assume there exists a solution x∗ ∈ D
of the least squares problem and a starting point x0 ∈ D such that
kF 0 (x)+ F (x∗ )k ≤ κ∗ kx − x∗ k
2(1 − κ∗ )
:= σ
ω
Theorem: Then, the sequence xk stays in Bρ (x∗ ) and
limk→∞ xk = x∗ , and
ω
kxk+1 − x∗ k ≤ kxk − x∗ k2 + κ∗ kxk − x∗ k
2
ρ := kx∗ − x0 k ≤
29 / 34
Convergence of Gauss-Newton method
I
Role of κ∗ : Represents omission of F 00 (x)
I
I
κ∗ can be chosen as 0 ⇒ local quadratic convergence
κ∗ > 0 linear convergence (thus we require κ∗ < 1).
I
Damping strategy as before (better: linesearch to make
guaranteed progress in minimization problem)
I
There can, in principle, be multiple solutions.
30 / 34
Overview
Nonlinear systems depending on parameters
31 / 34
Nonlinear systems depending on parameters
Consider
F (x
) with F : Rn
→ Rn ,
32 / 34
Nonlinear systems depending on parameters
Consider
F (x, λ) with F : Rn × Rp → Rn ,
where λ is one or a vector of parameters. This is a set of
parametrized nonlinear systems.
32 / 34
Nonlinear systems depending on parameters
Consider
F (x, λ) with F : Rn × Rp → Rn ,
where λ is one or a vector of parameters. This is a set of
parametrized nonlinear systems.
We would like to either
I
explore solution xλ for all λ or
I
find xλ0 for a specific λ0 that is difficult to solve directly
using Newton’s method (e.g., solution to Navier Stokes with
high Reynolds number)
Example:
F (x, λ) = x(x3 − x − λ).
32 / 34
Nonlinear systems depending on parameters
Regularity assumption:
F 0 (x, λ) has full rank when F (x, λ) = 0.
This excludes bifurcation points (due to the implicit function
theorem). Study of bifurcation points (often very interesting!)
requires additional work.
Continuation methods can exploit the local fast convergence of
Newton’s method: Given solution (x0 , λ0 , ), find solution (x1 , λ1 )
for λ1 close to λ0 .
33 / 34
Nonlinear systems depending on parameters
Continuation methods
I
Classical continuation: Use x0 as starting guess in Newton’s
method for λ1 .
I
Tangent continuation: Differentiate F (x(s), λ0 + s) with
respect to s, and evaluate at s = 0. This leads to a better
guess to initialize Newton’s method to find x1 .
These basic ideas additionally require some robustification to result
in reliable algorthms; see Deuflhard/Hohmann.
34 / 34
Download