Exam 2: Solution

advertisement
Exam 2: Solution
1.
a) From Taylor’s Theorem,
1
f (xk+1 ) = f (xk ) + f 0 (xk )(xk+1 − xk ) + f 00 (xk )(xk+1 − xk )2 + . . .
2
We want to choose xk+1 such that f (xk+1 ) = 0. Apply this to Taylor’s Theorem,
1
0 = f (xk ) + f 0 (xk )(xk+1 − xk ) + f 00 (xk )(xk+1 − xk )2 + . . .
2
Now solve for xk+1 using only the first order terms,
1
f 0 (xk )(xk+1 − xk ) = −f (xk ) − f 00 (xk )(xk+1 − xk )2 − . . .
2
k
f (x )
f 00 (xk )
xk+1 − xk = − 0 k − 0 k (xk+1 − xk )2 − . . .
f (x ) 2f (x )
f (xk )
f 00 (xk )
xk+1 = xk − 0 k − 0 k (xk+1 − xk )2 − . . .
f (x ) 2f (x )
Using the first order terms as our approximation, the Netwon-Raphson method is
xk+1 = xk −
f (xk )
f 0 (xk )
The next term in the series gives us the best approximation of the behavior of the error. If we define
∆x = xk+1 − xk , then
Ea = −
f 00 (xk )
∆x2
2f 0 (xk )
Since the leading values are just constants, we can say that the Newton-Raphson method has error which
is O(∆x2 ) which is also referred to as quadratic convergence.
b) The Newton-Raphson method has a very clear graphical interpretation. Taking the derivative at the
point xk , we have f 0 (xk ) which gives the slope of the line tangent to the function at xk . If we then find the
x-axis crossing of the tangent line, this is xk+1 as can be seen below. Thus, the Newton-Raphson method
approximates the function as a line which has slope equal to the derivative of at the current point. If the
function is actually a line (or when we get close enough resembles a line) then the method can converge in
a single iteration.
2.
a) The bisection method cannot be used for optimization problems because unlike root-finding problems,
we cannot determine where the minimum lies by dividing the bracket into two intervals. We need three
intervals (2 interior points) to determine where the minimum lies. The problem then becomes to choose
the two interior points in an intelligent way. By specifically using the golden ratio to define the interior
points, it is possible to keep one of the interior points from iteration to iteration. This is advantageous
because it only requires a single function evaluation at each iteration. This is more efficient than the two
function evaluations necessary for two new interior points.
b) Parabolic interpolation attempts to represent three points on the function as a parabola since “close”
enough to the minimum, the function will look like a parabola. This is analogous to the false-position
method for root-finding. The update equation is the minimum value of the estimating parabola. With
4 points (3 intervals), we can find which intervals the minimum must lie in and reduce the size of the
bracketing region.
3.
yi = a1 ea2 xi
a) Taking the natural log of both sides of the equation
ln(yi ) = ln (a1 ea3 xi )
= ln(a1 ) + ln (ea2 xi )
= ln(a1 ) + a2 xi
Defining yi0 = ln(yi ), b1 = ln(a1 ), zi,1 = 1, b2 = a2 , and zi,2 = xi , we get the general linear least-squares
representation
yi0 = b1 zi,1 + b2 zi,2
b) From the equation we can write a row of H as
hi = zi,1 zi,2
Putting into matrix form
H = z1 z2
and from the previous definitions
H= 1 x
The normal equations can be written as
H> Hb = H> y0
Applying our specific definition of H
> 1
1
>
H H= >
x
x
>
1 1 1> x
= >
x 1 x> x
P n
P x2i
= P
xi
xi
1> 0
H y = > y
x
>
1 ln(y)
= >
x ln(y)
P
ln(y
)
i
= P
xi ln(yi )
> 0
Resulting in the linear algebraic equation,
P P
Pn
P x2i b1 = P ln(yi )
xi
xi b2
xi ln(yi )
c) The optimization problem can be set-up as
Sr =
n
X
e2i =
X
(yi − a1 ea2 xi )2
i=1
To find the optimal points, we need ∇Sr = 0 with respect to the coefficients. This gives a system of
nonlinear equations
X
∂Sr
= −2
(yi − a1 ea2 xi ) ea2 xi = 0
∂a1
X
∂Sr
= −2a1
(yi − a1 ea2 xi ) xi ea2 xi = 0
∂a2
d) These represent a multi-dimensional root finding problem of the form f (a) = 0. In class, we discussed
two ways to solve this type of problem. The first is would use a fixed-point method where we define a
vector-valued function g such that a = g(a). The second is to use Newton-Raphson. This would require
determining the 2 by 2 Jacobian matrix but would lead to quadratic convergence. In either case, we could
use the results of the linear transformation as an initial guess for the nonlinear problem which would probably allow our numerical method to converge.
This nonlinear analysis will give a different result than the linear analysis because you are minimizing
with respect to two different sets of data (transformed vs. untransformed). Accordingly, the resulting
coefficients will give similar curves but not the same.
4.
a) We are trying to fit n data points with a unique (n-1)th order polynomial. Thus for 50 data points, we
would fit a 49th order polynomial.
The 49th order polynomial is unique. Thus, regardless of whether we use a Newton or Lagrange polynomial, the polynomial is the same. Therefore, they both fit the data the same, and the only difference
would come from computationally efficiency.
The 49th order polynomial is high order (even higher than the 20th order polynomial of your homework).
Thus, we expect there to be oscillations in the interpolating polynomial which will greatly effect the interpolation in certain areas (generally near the endpoints).
b) We are trying to fit each interval with a cubic polynomial of the form
si (x) = ai + bi x + ci x2 + di x3
which gives 4 unknown coefficients per interval and a total of 4(n − 1) unknown coefficients. In this case
that is 196 unknown coefficients.
We apply four major sets of constraints
1. Must pass through all data points (n = 50 constraints)
2. Splines must come together at interior points / Continuity (n − 2 = 48)
3. First derivatives of splines must be same at interior points / First derivative continuity (n − 2) = 48
4. Second derivatives of splines must be same at interior points / Second derivative continuity (n − 2 =
48)
This gives a total of 4(n − 1) − 2 = 194 constraints which leaves us two short of solving all of the unknown
coefficients. We can then apply our choice of 2 additional conditions. This could be those that we discussed
in class: natural, clamped-end or not-a-knot. Although they can really be any two conditions that you want.
5.
a) The three assumptions are:
1. xi is a fixed value (not random and no error)
2. yi is an independent random variable and each yi has the same standard deviation (σ)
3. yi is normally distributed
b) The subintervals [a, b] and [b, c] both satisfy the condition that f (x1 )f (x2 ) < 0 which means that each
interval has at least one root. Thus, the interval [a, c] must have at least 2 roots. You cannot say, however,
exactly how many roots are in the interval.
c) There are many different fixed-point methods that can be proposed because they are non-unique. One
possible fixed-point function is
g(x) =
x2 cos(x)
= x2 cos(x)e2x
e−2x
Although others are possible, this particular one is a good choice because g(x) is continuous where as
others might have division by zero problems. The condition for convergence is that |g 0 (x)| < 1.
d) The difference between the Newton-Raphson and Secant methods are that the Newton-Raphson uses
the true derivative where as the secant method uses a finite difference to approximate the derivative. We
expect the approximation to be close to the true derivative when the step-size is small. This generally
occurs when the method is close to converging.
e) The update equation for the method of Steepest Descent is
xk+1 = xk − α∇f (xk )
The direction of maximum increase is the gradient. Since we are performing minimization it makes sense
for us to travel in the direction of steepest descent which is −∇f . We try to control the length of the shift
in that direction by including a scalar term. In addition, when the method is converging the gradient will
go to zero which will stop changing the update value.
f) In general, we expect the false position method to converge faster because it uses additional information
and tries to approximate the function as a line. This is especially effective when the region is “close” to
the root. On the other hand the bisection method always divides the region in half without taking into
account the function.
The bisection may be more efficient when the function has a lot of curvature and the false-position will do
a poor job of approximating the behavior like below
g)
f1 = x31 x3 − x2 + x22 ex3 − 5
f2 = x21 x2 x3 − 3x43 + x3 cos x2 + 3
f3 = 2x22 x23 + x2 e−2x1 − 8
 ∂f1
J=
∂x1
 ∂f
 2
 ∂x1
∂f3
∂x1
∂f1
∂x2
∂f2
∂x2
∂f3
∂x2
∂f1 
∂x3

∂f2 
∂x3 
∂f3
∂x3

3x21 x3

=
 2x1 x2 x3
−2x2 e−2x1
2x2 ex3 − 1
x22 ex3


x21 x3 − x3 sin x2 x21 x2 − 12x33 + cos x2 

4x2 x23 + e−2x1
4x22 x3
h) We could change the Golden-Section search to look for larger values at each iteration. It is far easier, however, to use the existing minimization form and find the minimums of −f (x) which will be the
maximums of f (x).
Download