Exam 2: Solution 1. a) From Taylor’s Theorem, 1 f (xk+1 ) = f (xk ) + f 0 (xk )(xk+1 − xk ) + f 00 (xk )(xk+1 − xk )2 + . . . 2 We want to choose xk+1 such that f (xk+1 ) = 0. Apply this to Taylor’s Theorem, 1 0 = f (xk ) + f 0 (xk )(xk+1 − xk ) + f 00 (xk )(xk+1 − xk )2 + . . . 2 Now solve for xk+1 using only the first order terms, 1 f 0 (xk )(xk+1 − xk ) = −f (xk ) − f 00 (xk )(xk+1 − xk )2 − . . . 2 k f (x ) f 00 (xk ) xk+1 − xk = − 0 k − 0 k (xk+1 − xk )2 − . . . f (x ) 2f (x ) f (xk ) f 00 (xk ) xk+1 = xk − 0 k − 0 k (xk+1 − xk )2 − . . . f (x ) 2f (x ) Using the first order terms as our approximation, the Netwon-Raphson method is xk+1 = xk − f (xk ) f 0 (xk ) The next term in the series gives us the best approximation of the behavior of the error. If we define ∆x = xk+1 − xk , then Ea = − f 00 (xk ) ∆x2 2f 0 (xk ) Since the leading values are just constants, we can say that the Newton-Raphson method has error which is O(∆x2 ) which is also referred to as quadratic convergence. b) The Newton-Raphson method has a very clear graphical interpretation. Taking the derivative at the point xk , we have f 0 (xk ) which gives the slope of the line tangent to the function at xk . If we then find the x-axis crossing of the tangent line, this is xk+1 as can be seen below. Thus, the Newton-Raphson method approximates the function as a line which has slope equal to the derivative of at the current point. If the function is actually a line (or when we get close enough resembles a line) then the method can converge in a single iteration. 2. a) The bisection method cannot be used for optimization problems because unlike root-finding problems, we cannot determine where the minimum lies by dividing the bracket into two intervals. We need three intervals (2 interior points) to determine where the minimum lies. The problem then becomes to choose the two interior points in an intelligent way. By specifically using the golden ratio to define the interior points, it is possible to keep one of the interior points from iteration to iteration. This is advantageous because it only requires a single function evaluation at each iteration. This is more efficient than the two function evaluations necessary for two new interior points. b) Parabolic interpolation attempts to represent three points on the function as a parabola since “close” enough to the minimum, the function will look like a parabola. This is analogous to the false-position method for root-finding. The update equation is the minimum value of the estimating parabola. With 4 points (3 intervals), we can find which intervals the minimum must lie in and reduce the size of the bracketing region. 3. yi = a1 ea2 xi a) Taking the natural log of both sides of the equation ln(yi ) = ln (a1 ea3 xi ) = ln(a1 ) + ln (ea2 xi ) = ln(a1 ) + a2 xi Defining yi0 = ln(yi ), b1 = ln(a1 ), zi,1 = 1, b2 = a2 , and zi,2 = xi , we get the general linear least-squares representation yi0 = b1 zi,1 + b2 zi,2 b) From the equation we can write a row of H as hi = zi,1 zi,2 Putting into matrix form H = z1 z2 and from the previous definitions H= 1 x The normal equations can be written as H> Hb = H> y0 Applying our specific definition of H > 1 1 > H H= > x x > 1 1 1> x = > x 1 x> x P n P x2i = P xi xi 1> 0 H y = > y x > 1 ln(y) = > x ln(y) P ln(y ) i = P xi ln(yi ) > 0 Resulting in the linear algebraic equation, P P Pn P x2i b1 = P ln(yi ) xi xi b2 xi ln(yi ) c) The optimization problem can be set-up as Sr = n X e2i = X (yi − a1 ea2 xi )2 i=1 To find the optimal points, we need ∇Sr = 0 with respect to the coefficients. This gives a system of nonlinear equations X ∂Sr = −2 (yi − a1 ea2 xi ) ea2 xi = 0 ∂a1 X ∂Sr = −2a1 (yi − a1 ea2 xi ) xi ea2 xi = 0 ∂a2 d) These represent a multi-dimensional root finding problem of the form f (a) = 0. In class, we discussed two ways to solve this type of problem. The first is would use a fixed-point method where we define a vector-valued function g such that a = g(a). The second is to use Newton-Raphson. This would require determining the 2 by 2 Jacobian matrix but would lead to quadratic convergence. In either case, we could use the results of the linear transformation as an initial guess for the nonlinear problem which would probably allow our numerical method to converge. This nonlinear analysis will give a different result than the linear analysis because you are minimizing with respect to two different sets of data (transformed vs. untransformed). Accordingly, the resulting coefficients will give similar curves but not the same. 4. a) We are trying to fit n data points with a unique (n-1)th order polynomial. Thus for 50 data points, we would fit a 49th order polynomial. The 49th order polynomial is unique. Thus, regardless of whether we use a Newton or Lagrange polynomial, the polynomial is the same. Therefore, they both fit the data the same, and the only difference would come from computationally efficiency. The 49th order polynomial is high order (even higher than the 20th order polynomial of your homework). Thus, we expect there to be oscillations in the interpolating polynomial which will greatly effect the interpolation in certain areas (generally near the endpoints). b) We are trying to fit each interval with a cubic polynomial of the form si (x) = ai + bi x + ci x2 + di x3 which gives 4 unknown coefficients per interval and a total of 4(n − 1) unknown coefficients. In this case that is 196 unknown coefficients. We apply four major sets of constraints 1. Must pass through all data points (n = 50 constraints) 2. Splines must come together at interior points / Continuity (n − 2 = 48) 3. First derivatives of splines must be same at interior points / First derivative continuity (n − 2) = 48 4. Second derivatives of splines must be same at interior points / Second derivative continuity (n − 2 = 48) This gives a total of 4(n − 1) − 2 = 194 constraints which leaves us two short of solving all of the unknown coefficients. We can then apply our choice of 2 additional conditions. This could be those that we discussed in class: natural, clamped-end or not-a-knot. Although they can really be any two conditions that you want. 5. a) The three assumptions are: 1. xi is a fixed value (not random and no error) 2. yi is an independent random variable and each yi has the same standard deviation (σ) 3. yi is normally distributed b) The subintervals [a, b] and [b, c] both satisfy the condition that f (x1 )f (x2 ) < 0 which means that each interval has at least one root. Thus, the interval [a, c] must have at least 2 roots. You cannot say, however, exactly how many roots are in the interval. c) There are many different fixed-point methods that can be proposed because they are non-unique. One possible fixed-point function is g(x) = x2 cos(x) = x2 cos(x)e2x e−2x Although others are possible, this particular one is a good choice because g(x) is continuous where as others might have division by zero problems. The condition for convergence is that |g 0 (x)| < 1. d) The difference between the Newton-Raphson and Secant methods are that the Newton-Raphson uses the true derivative where as the secant method uses a finite difference to approximate the derivative. We expect the approximation to be close to the true derivative when the step-size is small. This generally occurs when the method is close to converging. e) The update equation for the method of Steepest Descent is xk+1 = xk − α∇f (xk ) The direction of maximum increase is the gradient. Since we are performing minimization it makes sense for us to travel in the direction of steepest descent which is −∇f . We try to control the length of the shift in that direction by including a scalar term. In addition, when the method is converging the gradient will go to zero which will stop changing the update value. f) In general, we expect the false position method to converge faster because it uses additional information and tries to approximate the function as a line. This is especially effective when the region is “close” to the root. On the other hand the bisection method always divides the region in half without taking into account the function. The bisection may be more efficient when the function has a lot of curvature and the false-position will do a poor job of approximating the behavior like below g) f1 = x31 x3 − x2 + x22 ex3 − 5 f2 = x21 x2 x3 − 3x43 + x3 cos x2 + 3 f3 = 2x22 x23 + x2 e−2x1 − 8 ∂f1 J= ∂x1 ∂f 2 ∂x1 ∂f3 ∂x1 ∂f1 ∂x2 ∂f2 ∂x2 ∂f3 ∂x2 ∂f1 ∂x3 ∂f2 ∂x3 ∂f3 ∂x3 3x21 x3 = 2x1 x2 x3 −2x2 e−2x1 2x2 ex3 − 1 x22 ex3 x21 x3 − x3 sin x2 x21 x2 − 12x33 + cos x2 4x2 x23 + e−2x1 4x22 x3 h) We could change the Golden-Section search to look for larger values at each iteration. It is far easier, however, to use the existing minimization form and find the minimums of −f (x) which will be the maximums of f (x).