Uploaded by Tonny Hii

p1

advertisement
Goals of the project
VV471 — Introduction to
Numerical Methods
• Perform research beyond what is taught
• Select content and clearly organise ideas
Project
• Orally present results in an appealing way
Ailin & Manuel — UM-JI (Summer 2022)
• Be critical on your results
1 Instructions
1.1 Format
Each of you need to be in one and only one 3-member team for the project. The project will be graded
according to the following three aspects:
1. Slide Presentation of your work
2. Clarity of the explanations of your work
3. Demonstration of the accuracy and efficiency of your algorithm
Each of those three aspects has an equal weight. Each member of the same team will receive the same
project mark.
1.2 Due
The oral presentation and testing of your algorithm are due in week 12. The exact time for the presentation and testing will be released in week 11.
1.3 Expectation
The following two sections are relevant topics to get your own research started on how to solve the
problem stated in section 4. They are by not means the algorithms or the only algorithms that you need
to solve the problem. We expect you to read academic articles as well as textbooks. We expect your
presentation to effectively summarise the important aspects of your research on the problem within 15
min; it will be followed by a 3-5 minutes question and answer period. We do not expect you to present
all aspects of what you have done during the research process. However, we expect you as a group to
know very well what you have done so that you can explain to us if we choose to ask a question on a
different aspect of your solution. After your presentation, we expect to see your algorithm in action. You
will demonstrate your algorithm on your machine. We expect accuracy and efficiency. Each group that
achieves a certain accuracy will receive a ranking according to the average time over 100 trials of your
algorithm. We expect at least 2 hours of work from each student per week to be put into the project.
2 Multidimensional root-finding
In this part, we consider the problems of solving a system of n equations in n unknowns,



f1 (x1 , x2 , ... , xn ) = 0,





f (x , x , ... , x ) = 0,
1
1
2
n
..


.





fn (x1 , x2 , ... , xn ) = 0.
For simplicity, we express this system of nonlinear equations in vector form, F(x) = 0, where F : D ⊆
Rn → Rn is a vector-valued function of n variables presented by vectors
 
x1
 
 x2 
 
x=.
 .. 
 
xn
 
f1
 
 f2 
 
and F =  . 
 .. 
 
fn
The component functions f1 , f2 , ..., fn are functions of x1 , x2 , ..., xn in general.
2.1 Limit, Continuity and Differentiability in Rn
Recall the notion of limit can be easily generalized to vector-valued functions and functions of several
variables. Given a function f : D ⊆ Rn → R and a point x0 ∈ D, we write lim f (x) = L if, for any ϵ > 0,
x→x0
there exists a δ > 0 such that f (x) − L < ϵ, whenever x ∈ D and 0 ≤ ∥x − x0 ∥ ≤ δ, where ∥·∥ denotes
any appropriate norm in Rn .
The function f is said to be continuous at x0 ∈ D, if lim f (x) = f (x0 ). Furthermore, f is said to be
x→x0
differentiable at x0 ∈ D if there is a constant vector ⃗a such that
lim
f (x) − f (x0 ) − aT (x − x0 )
∥x − x0 ∥
x→x0
=0
Similarly, for F : D ⊆ Rn → Rn and x0 ∈ D, we write lim F(x) = L if and only if
x→x0
lim fi (x) = Li
x→x0
for all
i = 1, 2, ... , n.
And F is said to be continuous at x0 ∈ D if lim F(x) = F(x0 ). Furthermore, F is said to be differentiable
x→x0
at x0 if there exists a linear transformation T : Rn → Rn such that
lim
x→x0
F(x) − F(x0 ) − T (x − x0 )
=0
∥x − x0 ∥
where the matrix associated with the linear transformation T is the matrix of partial derivatives, namely,
the Jacobian matrix J.
2.2 Fixed-point iteration
Now suppose we have the following function
F(x) = G(x) − x
where
G : D ⊆ Rn → Rn
In class, we have discuss why fixed-point iteration can be used to find the root of a single function of the
form
f (x) = g (x) − x
where g is Lipschitz continuous with Lipschitz constant c ∈ [0, 1). In the following assignment, we have
learned that if the function g instead is continuously differentiable and maps an interval I into itself,
then g has a fixed point in I. Furthermore, if there is a constant c < 1 such that
g ′ (x) < c
for all x ∈ I.
then the fixed-point iteration will converge to the fixed point. The existence and uniqueness of fixed
points of vector-valued functions of several variables and the convergence result are in line with the
single-variable case. Below are the analogues of the two theorems in higher dimensions.
Let D ⊆ Rn be closed and G : D → D be Lipschitz continuous, that is, there is a constant c
G(x1 ) − G(x2 ) ≤ c ∥x1 − x2 ∥
for all x1 , x2 ∈ D.
If this c, which is known as a Lipschitz constant, is 0 ≤ c < 1, then the function G has a unique fixed
point x∗ ∈ D and the sequence defined by
x0 ∈ D
and
xk = G(xk−1 )
for k = 1, 2, 3, ...
converges to x∗
Let D ⊆ Rn be closed and G : D → D be continuously differentiable. Furthermore, if there exists a
constant c < 1 and a matrix norm ∥·∥ such that
JG (x) ≤ c
for all x ∈ D.
then G has a unique fixed point x∗ in D, and fixed-point iteration is guaranteed to converge to x∗ for any
initial point x0 ∈ D.
2.3 Newton’s method
We have learned in class that Newton’s method can achieve quadratic convergence. Recall that this
method uses the following iteration formula:
xk+1 = xk −
f (xk )
f ′ (xk )
k = 0, 1, 2, ...
where x0 is an initial point.
Here we present Newton’s method to system of nonlinear equations. Let us define
−1
G(x) = x − JF (x)
F(x)
where JF (x) is the Jacobian matrix of F evaluated at x. Similar to what we did in class for the singlevariable case. In order to show quadratic convergence, all we need is to show the derivative, here the
Jacobian of G, is zero at the root x = x∗ given F has an appropriate level of smoothness. If we define

 
x1
 
 x2 
 
x =  . ,
 .. 
 
xn


f1 (x1 , x2 , ... , xn )


f2 (x1 , x2 , ... , xn )


F=

..


.


fn (x1 , x2 , ... , xn )

g1 (x1 , x2 , ... , xn )


g2 (x1 , x2 , ... , xn )


and G(x) = 

..


.


gn (x1 , x2 , ... , xn )
where fi and gi are the component functions of F and G, then we have

∂gi
∂ 
=
xi −
∂xk
∂xk
= δik −
n
X
j=1
n
X

−1
where aij is the ij-th element of JF (x)
aij (x)fj (x)
j=1
X
∂fj
∂aij
aij (x)
−
fj (x)
∂xk
∂xk
n
j=1
Note the first summation is equal to δik since
summation is equal to ik-th element of
∂fj
is the jk-th element of JF (x) by definition, thus the
∂xk
JF (x)
−1
JF (x) = I
The second summation vanishes at x∗ since F(x∗ ) = 0. Hence we reach the conclusion
∇gi (x∗ ) = 0
for all i = 1, 2, ... , n
∗
=⇒ JG (x ) = 0
Since JG is the zero matrix at x∗ , and G is continuously differentiable at x∗ for a sufficiently smooth F
that has an invertible JF (x∗ ), there is a constant c < 1 and a norm ∥·∥ such that
JG (x) ≤ c
for all x ∈ D.
This shows that Newton’s method in this case can have quadratic convergence near root x∗ . To apply
a multidimensional version of Newton’s method, we choose an initial point x0 in the hope of it being
sufficiently close to x∗ . Then we iterate as follows:
yk = −F(xk )
−1
wk = JF (xk )
yk
xk+1 = xk + wk
where the vector wk is computed by solving the system
JF (xk )wk = yk
using a method such as Gaussian elimination with back substitution. The above scheme can be applied
to a function F : Rn → Rm with only a slight modification to how we solve wk . When m < n, we use the
pseudoinverse, by which we can find one of potentially many roots of F. When m > n, we can try the
least-square solution, but convergence of this scheme become doubtful.
2.4 Quasi-Newton
When n becomes large, Newton’s method becomes very expensive due the computation of the Jacobian
matrix JF (xk ) in each iteration, and the computation of wk by solving
JF (xk )wk = yk
Furthermore, it is not possible to take information from one iteration to the next since the Jacobian matrix
is changing every iteration. An alternative is to modify Newton’s method by using approximate derivatives
as in the secant method for a single nonlinear equation. The secant method uses the approximation
f ′ (x1 ) ≈
f (x1 ) − f (x0 )
x −x
| 1 {z 0 }
A1
as a replacement for f ′ (x1 ) in the single-variable Newton’s method in the first iteration. Note the
approximation A1 is a scalar such that
A1 (x1 − x0 ) = f (x1 ) − f (x0 )
Along this line of thinking, we replace the Jacobian matrix JF (x1 ) by a matrix A1 that satisfies
A1 (x1 − x0 ) = F (x1 ) − F (x0 )
Any nonzero vector in Rn can be written as a sum of a scalar multiple of x1 − x0 and a scalar multiple of
a vector in the orthogonal complement of x1 − x0 . So to uniquely define the matrix A1 , we also need to
specify how A1 maps the orthogonal complement of x1 − x0 . However, no information is available about
the change in F in a direction orthogonal to x1 − x0 . So we specify there is no update in this direction,
that is,
A1 z = JF (x0 )z
whenever
(x1 − x0 )T z = 0
It is clear that the following matrix satisfies both conditions:
h
A1 = JF (x0 ) +
i
F(x1 ) − F(x0 ) − JF (x0 ) (x1 − x0 ) (x1 − x0 )T
∥x1 − x0 ∥22
So we use this A1 in place of JF (x1 ) to determine x2 , and then A2 to x3 , and etc. In general
sk = xk − xk−1
yk = F(xk ) − F(xk−1 )
Ak = Ak−1 +
yk − Ak−1 sk
∥sk ∥22
sT
k
wk = A−1
k F(xk )
xk+1 = xk − wk
However, simply approximating JF (xk ) by Ak does not reduce the cost by much, because we still have to
incur the same cost of solving the linear system
A k w k = yk
So to truly cut the cost, we have to find a way to efficiently update the inverse of Ak from previous
available information. This can be done by employing a matrix inversion formula known as the Sherman
and Morrison’ formula:
B + abT
−1
= B−1 −
B−1 abT B−1
1 + bT B−1 a
where B is invertible and 1 + bT A−1 a ̸= 0. So by letting
B = Ak−1 ,
a=
yk − Ak−1 sk
and
∥sk ∥22 ,
b=s
the Sherman and Morrison’s formula gives the following formula for the inverse
−1
A−1
k = Ak−1 +
sk −
T −1
A−1
k−1 yk sk Ak−1
−1
sT
k Ak−1 yk
Combining the above approximation for the Jacobian matrix and this inversion formula gives what is
known as Broyden’s Method. Note that it is not necessary to compute the approximation for the
Jacobian matrix, namely, Ak for k ≥ 1; A−1
k is computed directly.
3 Optimization
3.1 Introduction
In this part, we consider the general problem of finding the minimum for f : Rn → R with or without
constraints:
g (x) = 0
where x ∈ Rn .
and/or
h(x) ≥ 0
3.2 Nonsmoothness
If the function f is sufficiently smooth and convex in the absence of any constraint, we could turn the
optimisation problem into a root-finding problem by finding x∗ such that
∇f (x∗ ) = 0
So our discussion on Newton’s method and Quasi-Newton’s method are applicable with little or no
modification. In fact, finding a minimum of a function of many variables is easier in comparison with
multidimensional root-finding. This is because the component functions of an n-dimensional gradient
vector are not completely arbitrary functions. Rather, they obey so-called integrability conditions. So
those methods are usually fast for well-behaved functions, however, they are still not very robust. That is,
if the function is not sufficiently smooth or the initial guess is not sufficiently close to the minimum, then
they may converge very slowly or even diverge. The downhill simplex method is an alternative that makes
no assumptions about f and requires no derivatives. It is less efficient but it is very easy to understand
and implement. We are to cover it here so that we have an optional method for functions that are too
problematic for Newton and Quasi-Newton to handle. In many cases, the downhill simplex is faster than
simulated annealing but usually slower than Newton and Quasi-Newton. It is based on the geometric
object known as a simplex. To define a simplex properly, consider the following definitions:
A linear combination of x1 , x2 , ..., xn
n
X
αi xi = α1 x1 + α2 x2 + · · · + αn xn
i=1
is known as a convex combination if all coefficients sum to 1 and are non-negative, that is,
n
X
αi = 1
and
αi ≥ 0
for all i = 1, ... , n
i=1
The convex hull of a set S ∈ Rn is the set of all convex combinations of all points xi ∈ S.
A simplex is a geometric object in Rn that represents the convex hull of n + 1 points in Rn , which are
known as vertices of the simplex. So a simplex in R is a line segment, a simplex in R2 is a triangle, a
simplex in R3 is a tetrahedron, and etc. The downhill simplex method iteratively generates a a sequence
of smaller and smaller simplexes in Rn to enclose the minimum point x∗ . At each iteration, the vertices
{xj }n+1
j=1 of the simplex are ordered according their f (x) values
f (x1 ) ≤ f (x2 ) ≤ · · · f (xn+1 ).
The point x1 is referred as the best vertex, and xn+1 as the worst vertex. The downhill simplex method
uses 4 possible operations to iterate from one simplex to the next:
reflection
expansion
contraction
shrink
α>0
β>1
0<γ<1
0<δ<1
Each operation is associated with a scalar parameter, and the centroid of n vertices other than the worst
vertex
1X
x̄ =
x
n
n
j=1
The downhill simplex method iterates by the following scheme:
1. Sort:
Evaluate f at all n + 1 vertices of the current simplex and sort the vertices so that
f (x1 ) ≤ f (x2 ) ≤ · · · f (xn+1 )
2. Reflection:
Compute the reflection point
xr = x̄ + α(x̄ − xn+1 )
#Since xn+1 is the worst vertex, we want moving away from it. This so-called reflection point is
just a test point along the line defined by x̄ and xn+1 away from the point xn+1 .
3. Conditional Execution:
If f (x1 ) ≤ f (xr ) < f (xn ), then replace the worst vertex xn+1 with xr . Next Iteration.
If f (xr ) < f (x1 ), then do the Expansion step.
If f (xn ) ≤ f (xr ) < f (xn+1 ), then jump to the Outside Contraction step.
If f (xn+1 ) ≤ f (xr ), then jump to the Inside Contraction step.
4. Expansion:
Compute the expansion point
xe = x̄ + β(xr − x̄)
# If moving along the line defined by x̄ and xn+1 away from the point xn+1 gives a new best vertex,
perhaps the minimum is just farther away in that region. So we move further into this region by
pushing further along this direction.
5. Conditional Execution:
If f (xe ) < f (xr ), then replace the worst vertex xn+1 with xe ; else replace xn+1 with xr ;
Next Iteration.
6. Outside Contraction:
Compute the outside contraction point
xoc = x̄ + γ(xr − x̄)
# If the test point xr is almost as bad as the worst vertex, then we contract along this direction
to see whether we have pushed too far in this direction.
7. Conditional Execution:
If f (xoc ) ≤ f (xr ), then replace the worst vertex xn+1 with xoc ; Next Iteration.
Else do the Shrink step.
8. Insider Contraction:
Compute the inside contraction point
xic = x̄ − γ(xr − x̄)
# If the test point xr is even worse then the worst vertex, then we contract further along this
direction to be on the same side as xn+1 .
9. Conditional Execution:
If f (xic ) < f (xn+1 ), then replace the worst vertex xn+1 with xic ; Next Iteration.
Else do the Shrink step.
10. Shrink:
Shrink the simplex toward the best vertex. Assign, for 2 ≤ j ≤ n + 1,
xj = x1 + δ(xj − x1 )
Next Iteration.
# If moving away from the worst vertex xn+1 alone is not useful, we have to consider moving
towards the best vertex first before moving away from the worst vertex again. So this is a back
and forth process like the movement of an ameba.
Even when f is sufficiently smooth and the gradient/Hessian is not expensive to evaluate , the downhill
simplex method still provides a good initial guidance before Newton’s method and Quasi-Newton’s method
are employed to refine the solution once we are sufficiently close.
4 Tasks
The followings are the actual tasks for this project.
4.1 Task 1
Let x, θ ∈ R5 , y ∈ {0, 1} and hθ (x) = g θT x , where g (z) =
1
1+e −z .
Solve arg minθ J (θ), where
1 X
−yi ln hθ (xi ) − (1 − yi ) ln 1 − hθ (xi )
n
n
J (θ) =
i=1
The sets {yi } and {xi } are provided in text files on Canvas.
4.2 Task 2
Suppose


n
X
π(x) = exp 
λj x j 
for x ∈ (0, 1)
j=0
Solve

Z
arg min
λ

1
π(x) dx −
0
n
X
i=0


λi mi

for i = 0, 1, 2, ... n
The ordered set {m0 , m1 , m2 , ... mn } is provided in text file m1.txt on Canvas.
4.3 Task 3
Discuss whether the following integral is solvable or not. If so proceed with it and otherwise prove it
cannot be solved.
Z
1
2
0
Z
···
1Y
0 i<j
ui − uj
ui + uj
!2
du1
du5
···
u1
u5
4.4 Reminder
The material provided with this document should serve as a starting point of your research. Higher marks
will be given to groups that go beyond what have been given and covered in class. You need to find
the right numerical methods and implementing them correctly by writing your own MATLAB functions
from scratch to complete those tasks. You are not allowed to use high-level Matlab functions to avoid
implement key algorithms in your solution. For example, high-level functions including but not limited to
the followings are not allowed mldivide; qr; svd; vpasolve; integral; fminbnd.
Download