MACM 316 Assignment 1 Solutions 1.2 Finite Precision Arithmetic 1.2:6e Rounding Arithmetic Use four-digit rounding arithmetic to perform the following calculation. Compute the absolute error and relative error with the exact value determined to within at least 5 digits. ! 13 6 13 − f l 67 f l f l 14 f l(0.9286 − 0.8571) 14 − 7 = fl (1) ≈ fl 2e − 5.4 f l(f l(2f l(e)) − f l(5.4)) f l(f l(2(2.718)) − 5.400) f l(0.9286 − 0.8571) (2) = fl f l(5.436 − 5.400) 0.0715 = fl note that subtraction left us with f ewer than 4 digits 0.036 (3) = 1.986 (4) however repeating the same calculation 16 digit rounding arithmetic 13 14 − 76 ≈ 1.953540139286012 2e − 5.4 (5) ǫa = |1.953540139286012 − 1.986| ≈ 3.2 × 10−2 (6) Then the absolute error is and the relative error is ǫr = |1.953540139286012 − 1.986| ≈ 1.7 × 10−2 = 1.7% 1.953540139286012 (7) Note the choice to represent the error in two digits is somewhat arbitrary, but should be sufficient to give you an idea of how accurate the approximation is. 1.2:14a Chopping Arithmetic and the Quadratic Formula Use four-digit chopping arithmetic and the formulae of Example 5 to find the most accurate approximations to the roots of the following quadratic equations. Compute the absolute and relative errors. 1 1 2 123 x − x+ =0 3 4 6 (8) q (9) Applying the quadratic formula x± = 123 4 ± 123 4 1 23 + 4 13 61 but we want to avoid subtracting numbers of the same sign (or adding numbers of opposite sign) so we split this up and modify the problematic case and then do the finite precision arithmetic. x+ = x− = 123 4 + q 123 2 4 2 13 − 4 13 61 (10) 123 4 − q − 4 13 61 (11) 123 4 + q 123 2 4 2 13 2 16 123 2 4 = (12) − 4 13 61 MACM 316 Assignment 1 Solutions Now doing the calculations in finite precision f l fl x+ ≈ x̃+ = f l fl fl = fl 123 4 123 4 + fl + fl r r fl fl fl 123 2 4 123 2 4 f l 2f l fl fl fl f l 2f l − f l f l 4f l 1 1 3 3 − f l f l 4f l 1 1 3 3 fl fl 1 6 1 6 (13) (14) r 2 f l 30.75 + f l − f l (f l (4(0.3333)) (0.1666)) f l f l (30.75) = fl f l (2(0.3333)) p f l (945.5 − f l ((1.333)(0.1666))) f l 30.75 + f l = fl 0.6666 p f l 30.75 + f l f l (945.5 − 0.2220) = fl 0.6666 √ ! f l 30.75 + f l 945.2 = fl 0.6666 f l (30.75 + 30.74) = fl 0.6666 61.49 = fl 0.6666 (16) (17) (18) (19) (20) = 92.24 Similarly reusing some of our intermediate results from above, x− ≈ x̃− = f l 123 4 + fl f l (2(0.1666)) = fl 61.50 0.3332 = fl 61.49 fl fl r 1 6 f l 2f l 2 f l f l f l 123 − f l f l 4f l 4 (21) 1 3 (15) fl 1 6 (22) (23) (24) = 0.005418 (25) Repeating these calculations with 16 digit chopping arithmetic, the solution is x+ = 92.24457962731231 x− = 0.005420372687697272 (26) The absolute errors are ǫa+ = |x+ − x̃+ | ≈ 4.6 × 10−3 ǫa− = |x− − x̃− | ≈ 2.4 × 10−6 (27) and the relative errors are ǫr+ = |x+ − x̃+ | ≈ 5.0 × 10−5 = 0.0050% |x+ | ǫr− = |x− − x̃− | ≈ 4.4 × 10−4 = 0.044% |x− | (28) MACM 316 Assignment 1 Solutions 1.2:18 Finite Precision Taylor Series We wish to approximate e−5 using the 9th order Taylor polynomial. e−5 ≈ n X (−5)i i=0 i! ≈ sumn (29) where sumn is defined by the recursion relation sum−1 = 0 (30) sumn = f l sumn−1 + f l n f l ((−5) ) f l (n!) (31) Note that there are two approximations here. The first is the fact that we do not take the limit n → ∞. The error associated with this approximation is called truncation error. This should get smaller as we add terms to the series. The second approximation is the fact that we use finite precision arithmetic to evaluate this truncated series. The error associated with this approximation is called roundoff error. Thus, when evaluate the relative error in sumi ǫri = |e−5 − sumi | e−5 (32) we should expect contributions from both the roundoff and truncation error. The table below shows the intermediate results in computing sum9 . Since ǫri increases as we add terms to the series, we know our calculation is being dominated by roundoff error because the truncation error should be decreasing as we add terms. f l((−5)i ) i i f l (−5) f l (i!) f l sumi ǫri f l(i!) 0 1 2 3 4 5 6 7 8 9 1.00e + 00 −5.00e + 00 2.50e + 01 −1.25e + 02 6.25e + 02 −3.12e + 03 1.56e + 04 −7.81e + 04 3.90e + 05 −1.95e + 06 1.00e + 00 1.00e + 00 2.00e + 00 6.00e + 00 2.40e + 01 1.20e + 02 7.20e + 02 5.04e + 03 4.03e + 04 3.62e + 05 1.00e + 00 −5.00e + 00 1.25e + 01 −2.08e + 01 2.60e + 01 −2.60e + 01 2.16e + 01 −1.55e + 01 9.67e + 00 −5.38e + 00 1.00e + 00 −4.00e + 00 8.50e + 00 −1.23e + 01 1.37e + 01 −1.23e + 01 9.30e + 00 −6.20e + 00 3.47e + 00 −1.91e + 00 1.5e + 02 5.9e + 02 1.3e + 03 1.8e + 03 2.0e + 03 1.8e + 03 1.4e + 03 9.2e + 02 5.1e + 02 2.8e + 02 The reason that this is happening is that the sign of sumi−1 is always opposite the sign of f l f l((−5)i ) f l(i!) so Eq. (31) is adding two numbers of opposite sign at each step. As we know, this is prone to loss of precision. The solution is to compute the series in a different way which avoids adding numbers of opposite sign. This can be achieved by instead computing e−5 = where sumi is defined by the recursion relation 1 1 ≈ Pn 5 e i=0 5i i! ≈ 1 sumi (33) sum−1 = 0 (34) sumi = f l sumi−1 + f l i f l(5 ) i! (35) 1 by this method. The first thing to notice is The table below shows the intermediate results in computing sum 9 that there are no “−” signs in this table. The second thing to notice is that ǫri decreases as we add terms suggesting that truncation error is now more important MACM 316 Assignment 1 Solutions f l 5i i 0 1 2 3 4 5 6 7 8 9 1.00e + 00 5.00e + 00 2.50e + 01 1.25e + 02 6.25e + 02 3.12e + 03 1.56e + 04 7.81e + 04 3.90e + 05 1.95e + 06 f l (i!) 1.00e + 00 1.00e + 00 2.00e + 00 6.00e + 00 2.40e + 01 1.20e + 02 7.20e + 02 5.04e + 03 4.03e + 04 3.62e + 05 fl f l(5i ) f l(i!) 1.00e + 00 5.00e + 00 1.25e + 01 2.08e + 01 2.60e + 01 2.60e + 01 2.16e + 01 1.55e + 01 9.67e + 00 5.38e + 00 1 sumi ǫri 1.00e + 00 1.67e − 01 5.40e − 02 2.54e − 02 1.53e − 02 1.10e − 02 8.93e − 03 7.87e − 03 7.35e − 03 7.09e − 03 1.5e + 02 2.4e + 01 7.0e + 00 2.8e + 00 1.3e + 00 6.3e − 01 3.3e − 01 1.7e − 01 9.1e − 02 5.3e − 02 In conclusion, we would have to say that the second method is a significant improvement over the first as the relative errors generated by this method are a lot smaller. We also have the ability to further reduce these errors by adding more terms to the series whereas it wasn’t clear whether or not this led to appreciable improvements in the first case. 1.3 Convergence 1.3:6b convergence as n → ∞ Find the rate of convergence of lim sin n→∞ By making the substitution h = 1 n, 1 n2 =0 we have the related problem of finding the rate of convergence of lim sin h2 = 0 h→0 Thus we have a function f (h) = sin h2 (36) (37) (38) that trivially converges f (0) = 0. We want to find the largest value of p such that for some constant K |f (h) − f (0)| = |f (h)| ≤ K |hp | for small h (39) We know from Taylor’s theorem (1.14) that h2 ′′ f (c) for some c between 0 and h 2 h2 −4 sin(c2 )c2 + 2 cos(c2 ) = sin(x2 ) + h 2x cos(x2 ) x=0 + 2 = h2 −2 sin(c2 )c2 + cos(c2 ) f (h) = f (0) + hf ′ (0) + So |f (h)| = h2 −2 sin(c2 )c2 + cos(c2 ) ≤ h2 2c2 sin(c2 ) + cos(c2 ) ≤ h2 2c2 c2 + 1 ≤ h2 2h4 + 1 2 ≤ h (2 + 1) for h ≤ 1 = 3h 2 (40) (41) (42) (43) (44) (45) (46) (47) (48) MACM 316 Assignment 1 Solutions Which tells us that Eq. (54) holds for K = 3, p = 2 and by “small” h, we mean h ≤ 1. To show that 2 is indeed the largest value of p for which we can make Eq. (54) hold, we simply expand the Taylor series to one more order h3 h2 (49) f (h) = f (0) + hf ′ (0) + f ′′ (h) + f (3) (c) for some c between 0 and h 2 6 h3 (50) = 0 + h2 −2 sin(x2 )x2 + cos(x2 ) x=0 + f (3) (c) 6 h3 = h2 + f (3) (c) (51) 6 (52) In other words, the h2 term does not vanish like the ones before it, so we can’t do the same thing to show that |f (h)| ≤ K|h3 |. So transforming |f (h) − f (0)| = |f (h)| ≤ 3 h2 h≤1 (53) back to our original problem Eq. 36 via h = 1 n we have sin 1 − lim sin 1 = sin 1 ≤ 3 1 n2 2 2 2 n→∞ n n n so the sequence converges like 1 n2 1≤n (54) or sin 1 n2 =O 1 n2 (55) 1.3:7c convergence as h → 0 Find the rate of convergence of sin(h) − h cos(h) =0 (56) h We want to examine the behaviour of the function sin(h) − h cos(h) (57) f (h) = h near h = 0. But we need to be careful since we have h in the denominator, so we expand just the numerator in a Taylor series. The lowest order nontrivial Taylor series about h = 0 for sin and cos are lim h→0 h3 cos(c1 ) for c1 between 0 and h 6 2 h cos(h) = 1 − cos(c2 ) for c2 between 0 and h 2 sin(h) = h − (58) (59) So we have h − h3 cos(c ) − h 1 − h2 cos(c ) 1 2 6 2 |f (h) − f (0)| = |f (h)| = h 2 2 h h = 1 − cos(c1 ) − 1 + cos(c2 ) 6 2 1 1 = − cos(c1 ) + cos(c2 ) h2 6 2 1 1 ≤ cos(c1 ) + cos(c2 ) h2 6 2 2 ≤ h2 3 (60) (61) (62) (63) (64) (65) MACM 316 Assignment 1 Solutions So sin(h) − h cos(h) = O h2 h (66) 1.3:14 Orders of convergence Make a table listing h, h2 , h3 and h4 for h = 0.5, 0.1.0.01, 0.001 and discuss the varying rates of convergence. h 5.00e − 01 1.00e − 01 1.00e − 02 1.00e − 03 h2 2.50e − 01 1.00e − 02 1.00e − 04 1.00e − 06 h3 1.25e − 01 1.00e − 03 1.00e − 06 1.00e − 09 h4 6.25e − 02 1.00e − 04 1.00e − 08 1.00e − 12 Clearly the higher the power, the faster the convergence, (i.e., h > h2 > h3 > h4 ). For every order of magnitude by which we decrease h, hp decreases by p orders of magnitude. Now Suppose that 0 < q < p and that F (h) = L + O(hp ). Show that F (h) = L + O(hq ). We know that |hp | ≤ |hq | for q ≤ p, |h| ≤ 1 (67) And statment that F (h) = L + O(hp ) means that |F (h) − L| ≤ K |hp | for small h (68) |F (h) − L| ≤ K |hp | ≤ K |hq | for |h| ≤ 1 |F (h) − L| ≤ K |hq | (69) (70) F (h) = L + O (hq ) (71) So we can say Which translates back into 6.1 Linear Algebra 6.1:10 Singular matrices Given the linear system 1 −1 α −2 −1 α 2 −α x = 3 2 1 1 (72) For what values of α does the system have no solutions and infinitely many solutions? Finding the values of α for which the determinant vanishes will only tell us if we have infinitely many or no solutions but will not distinguish between the two cases. Furthermore part c asks us to solve the system for general MACM 316 Assignment 1 Solutions α so we’ll go ahead and do that first and see what values of α might give us problems −2 1 −1 α 1 −1 α −2 −1 2 −α 3 → 1 0 1 7 0 2 α 1 1 0 1 + α 1 − α 2(1 + α) 2 −2 1 −1 α 1 0 1 7→ 0 2 0 0 1 − α (1 + α) −2 1 −1 α 1 0 1 α 6= ±1 7→ 0 1 0 0 1 1−α α 1 −1 0 −2 − 1−α 1 0 1 7→ 0 1 0 0 1 1−α α 1 0 0 −1 − 1−α 1 7→ 0 1 0 1 0 0 1 1−α 1 1 0 0 − 1−α 1 = 0 1 0 1 0 0 1 1−α (73) (74) (75) (76) (77) (78) So x1 = − x2 = 1 x3 = 1 1−α 1 1−α (79) (80) (81) We can see in the third row of Eq (74) that α = ±1 will cause problems. If α = 1 we have an inconsistent system (no solutions): 1 −1 1 −2 0 1 0 1 (82) 0 0 0 2 If α = −1, we have no constraint on x3 (infinitely many solutions): 1 −1 −1 −2 0 1 0 1 0 0 0 0 (83)