1.2 Finite Precision Arithmetic

advertisement
MACM 316
Assignment 1 Solutions
1.2 Finite Precision Arithmetic
1.2:6e Rounding Arithmetic
Use four-digit rounding arithmetic to perform the following calculation. Compute the absolute error and relative
error with the exact value determined to within at least 5 digits.
!
13
6
13
− f l 67
f l f l 14
f l(0.9286 − 0.8571)
14 − 7
= fl
(1)
≈ fl
2e − 5.4
f l(f l(2f l(e)) − f l(5.4))
f l(f l(2(2.718)) − 5.400)
f l(0.9286 − 0.8571)
(2)
= fl
f l(5.436 − 5.400)
0.0715
= fl
note that subtraction left us with f ewer than 4 digits
0.036
(3)
= 1.986
(4)
however repeating the same calculation 16 digit rounding arithmetic
13
14
− 76
≈ 1.953540139286012
2e − 5.4
(5)
ǫa = |1.953540139286012 − 1.986| ≈ 3.2 × 10−2
(6)
Then the absolute error is
and the relative error is
ǫr =
|1.953540139286012 − 1.986|
≈ 1.7 × 10−2 = 1.7%
1.953540139286012
(7)
Note the choice to represent the error in two digits is somewhat arbitrary, but should be sufficient to give you an
idea of how accurate the approximation is.
1.2:14a Chopping Arithmetic and the Quadratic Formula
Use four-digit chopping arithmetic and the formulae of Example 5 to find the most accurate approximations to the
roots of the following quadratic equations. Compute the absolute and relative errors.
1
1 2 123
x −
x+ =0
3
4
6
(8)
q
(9)
Applying the quadratic formula
x± =
123
4
±
123
4
1
23
+ 4 13 61
but we want to avoid subtracting numbers of the same sign (or adding numbers of opposite sign) so we split this
up and modify the problematic case and then do the finite precision arithmetic.
x+ =
x− =
123
4
+
q
123 2
4
2 13
− 4 13 61
(10)
123
4
−
q
− 4 13 61
(11)
123
4
+
q
123 2
4
2 13
2 16
123 2
4
=
(12)
− 4 13 61
MACM 316
Assignment 1 Solutions
Now doing the calculations in finite precision
f
l
fl

x+ ≈ x̃+ = f l 


fl fl
= fl 


123
4
123
4
+ fl
+ fl
r
r
fl fl fl
123 2
4
123 2
4
f l 2f l
fl fl fl
f l 2f l
− f l f l 4f l
1
1
3
3
− f l f l 4f l
1
1
3
3
fl
fl
1
6
1
6

(13)







(14)
r 
2
f
l
30.75
+
f
l
−
f
l
(f
l
(4(0.3333))
(0.1666))
f
l
f
l
(30.75)



= fl 


f l (2(0.3333))


p
f l (945.5 − f l ((1.333)(0.1666)))
f l 30.75 + f l

= fl 
0.6666
p

 f l 30.75 + f l
f l (945.5 − 0.2220)

= fl 
0.6666
√
!
f l 30.75 + f l 945.2
= fl
0.6666
f l (30.75 + 30.74)
= fl
0.6666
61.49
= fl
0.6666

(16)
(17)
(18)
(19)
(20)
= 92.24
Similarly reusing some of our intermediate results from above,


x− ≈ x̃− = f l 

123
4
+ fl
f l (2(0.1666))
= fl
61.50
0.3332
= fl
61.49
fl fl
r
1
6
f l 2f l
2 f l f l f l 123
− f l f l 4f l
4
(21)

1
3
(15)
fl
1
6



(22)
(23)
(24)
= 0.005418
(25)
Repeating these calculations with 16 digit chopping arithmetic, the solution is
x+ = 92.24457962731231
x− = 0.005420372687697272
(26)
The absolute errors are
ǫa+ = |x+ − x̃+ | ≈ 4.6 × 10−3
ǫa− = |x− − x̃− | ≈ 2.4 × 10−6
(27)
and the relative errors are
ǫr+ =
|x+ − x̃+ |
≈ 5.0 × 10−5 = 0.0050%
|x+ |
ǫr− =
|x− − x̃− |
≈ 4.4 × 10−4 = 0.044%
|x− |
(28)
MACM 316
Assignment 1 Solutions
1.2:18 Finite Precision Taylor Series
We wish to approximate e−5 using the 9th order Taylor polynomial.
e−5 ≈
n
X
(−5)i
i=0
i!
≈ sumn
(29)
where sumn is defined by the recursion relation
sum−1 = 0
(30)
sumn = f l sumn−1 + f l
n
f l ((−5) )
f l (n!)
(31)
Note that there are two approximations here. The first is the fact that we do not take the limit n → ∞. The error
associated with this approximation is called truncation error. This should get smaller as we add terms to the
series.
The second approximation is the fact that we use finite precision arithmetic to evaluate this truncated series.
The error associated with this approximation is called roundoff error.
Thus, when evaluate the relative error in sumi
ǫri =
|e−5 − sumi |
e−5
(32)
we should expect contributions from both the roundoff and truncation error.
The table below shows the intermediate results in computing sum9 . Since ǫri increases as we add terms to the
series, we know our calculation is being dominated by roundoff error because the truncation error should be
decreasing as we add terms.
f l((−5)i )
i
i
f l (−5)
f l (i!) f l
sumi
ǫri
f l(i!)
0
1
2
3
4
5
6
7
8
9
1.00e + 00
−5.00e + 00
2.50e + 01
−1.25e + 02
6.25e + 02
−3.12e + 03
1.56e + 04
−7.81e + 04
3.90e + 05
−1.95e + 06
1.00e + 00
1.00e + 00
2.00e + 00
6.00e + 00
2.40e + 01
1.20e + 02
7.20e + 02
5.04e + 03
4.03e + 04
3.62e + 05
1.00e + 00
−5.00e + 00
1.25e + 01
−2.08e + 01
2.60e + 01
−2.60e + 01
2.16e + 01
−1.55e + 01
9.67e + 00
−5.38e + 00
1.00e + 00
−4.00e + 00
8.50e + 00
−1.23e + 01
1.37e + 01
−1.23e + 01
9.30e + 00
−6.20e + 00
3.47e + 00
−1.91e + 00
1.5e + 02
5.9e + 02
1.3e + 03
1.8e + 03
2.0e + 03
1.8e + 03
1.4e + 03
9.2e + 02
5.1e + 02
2.8e + 02
The reason that this is happening is that the sign of sumi−1 is always opposite the sign of f l
f l((−5)i )
f l(i!)
so
Eq. (31) is adding two numbers of opposite sign at each step. As we know, this is prone to loss of precision.
The solution is to compute the series in a different way which avoids adding numbers of opposite sign. This can
be achieved by instead computing
e−5 =
where sumi is defined by the recursion relation
1
1
≈ Pn
5
e
i=0
5i
i!
≈
1
sumi
(33)
sum−1 = 0
(34)
sumi = f l sumi−1 + f l
i
f l(5 )
i!
(35)
1
by this method. The first thing to notice is
The table below shows the intermediate results in computing sum
9
that there are no “−” signs in this table. The second thing to notice is that ǫri decreases as we add terms suggesting
that truncation error is now more important
MACM 316
Assignment 1 Solutions
f l 5i
i
0
1
2
3
4
5
6
7
8
9
1.00e + 00
5.00e + 00
2.50e + 01
1.25e + 02
6.25e + 02
3.12e + 03
1.56e + 04
7.81e + 04
3.90e + 05
1.95e + 06
f l (i!)
1.00e + 00
1.00e + 00
2.00e + 00
6.00e + 00
2.40e + 01
1.20e + 02
7.20e + 02
5.04e + 03
4.03e + 04
3.62e + 05
fl
f l(5i )
f l(i!)
1.00e + 00
5.00e + 00
1.25e + 01
2.08e + 01
2.60e + 01
2.60e + 01
2.16e + 01
1.55e + 01
9.67e + 00
5.38e + 00
1
sumi
ǫri
1.00e + 00
1.67e − 01
5.40e − 02
2.54e − 02
1.53e − 02
1.10e − 02
8.93e − 03
7.87e − 03
7.35e − 03
7.09e − 03
1.5e + 02
2.4e + 01
7.0e + 00
2.8e + 00
1.3e + 00
6.3e − 01
3.3e − 01
1.7e − 01
9.1e − 02
5.3e − 02
In conclusion, we would have to say that the second method is a significant improvement over the first as the
relative errors generated by this method are a lot smaller. We also have the ability to further reduce these errors by
adding more terms to the series whereas it wasn’t clear whether or not this led to appreciable improvements in the
first case.
1.3 Convergence
1.3:6b convergence as n → ∞
Find the rate of convergence of
lim sin
n→∞
By making the substitution h =
1
n,
1
n2
=0
we have the related problem of finding the rate of convergence of
lim sin h2 = 0
h→0
Thus we have a function
f (h) = sin h2
(36)
(37)
(38)
that trivially converges f (0) = 0.
We want to find the largest value of p such that for some constant K
|f (h) − f (0)| = |f (h)| ≤ K |hp | for small h
(39)
We know from Taylor’s theorem (1.14) that
h2 ′′
f (c) for some c between 0 and h
2
h2
−4 sin(c2 )c2 + 2 cos(c2 )
= sin(x2 ) + h 2x cos(x2 ) x=0 +
2
= h2 −2 sin(c2 )c2 + cos(c2 )
f (h) = f (0) + hf ′ (0) +
So
|f (h)| = h2 −2 sin(c2 )c2 + cos(c2 )
≤ h2 2c2 sin(c2 ) + cos(c2 )
≤ h2 2c2 c2 + 1
≤ h2 2h4 + 1
2
≤ h (2 + 1) for h ≤ 1
= 3h
2
(40)
(41)
(42)
(43)
(44)
(45)
(46)
(47)
(48)
MACM 316
Assignment 1 Solutions
Which tells us that Eq. (54) holds for K = 3, p = 2 and by “small” h, we mean h ≤ 1.
To show that 2 is indeed the largest value of p for which we can make Eq. (54) hold, we simply expand the Taylor
series to one more order
h3
h2
(49)
f (h) = f (0) + hf ′ (0) + f ′′ (h) + f (3) (c) for some c between 0 and h
2
6
h3
(50)
= 0 + h2 −2 sin(x2 )x2 + cos(x2 ) x=0 + f (3) (c)
6
h3
= h2 + f (3) (c)
(51)
6
(52)
In other words, the h2 term does not vanish like the ones before it, so we can’t do the same thing to show that
|f (h)| ≤ K|h3 |.
So transforming
|f (h) − f (0)| = |f (h)| ≤ 3 h2 h≤1
(53)
back to our original problem Eq. 36 via h =
1
n
we have
sin 1 − lim sin 1 = sin 1 ≤ 3 1 n2 2
2
2
n→∞
n
n
n
so the sequence converges like
1
n2
1≤n
(54)
or
sin
1
n2
=O
1
n2
(55)
1.3:7c convergence as h → 0
Find the rate of convergence of
sin(h) − h cos(h)
=0
(56)
h
We want to examine the behaviour of the function
sin(h) − h cos(h)
(57)
f (h) =
h
near h = 0. But we need to be careful since we have h in the denominator, so we expand just the numerator in a
Taylor series.
The lowest order nontrivial Taylor series about h = 0 for sin and cos are
lim
h→0
h3
cos(c1 ) for c1 between 0 and h
6
2
h
cos(h) = 1 −
cos(c2 ) for c2 between 0 and h
2
sin(h) = h −
(58)
(59)
So we have
h − h3 cos(c ) − h 1 − h2 cos(c ) 1
2
6
2
|f (h) − f (0)| = |f (h)| = h
2
2
h
h
= 1 −
cos(c1 ) − 1 +
cos(c2 )
6
2
1
1
= − cos(c1 ) + cos(c2 ) h2
6
2
1
1
≤ cos(c1 ) + cos(c2 ) h2
6
2
2
≤
h2
3
(60)
(61)
(62)
(63)
(64)
(65)
MACM 316
Assignment 1 Solutions
So
sin(h) − h cos(h)
= O h2
h
(66)
1.3:14 Orders of convergence
Make a table listing h, h2 , h3 and h4 for h = 0.5, 0.1.0.01, 0.001 and discuss the varying rates of convergence.
h
5.00e − 01
1.00e − 01
1.00e − 02
1.00e − 03
h2
2.50e − 01
1.00e − 02
1.00e − 04
1.00e − 06
h3
1.25e − 01
1.00e − 03
1.00e − 06
1.00e − 09
h4
6.25e − 02
1.00e − 04
1.00e − 08
1.00e − 12
Clearly the higher the power, the faster the convergence, (i.e., h > h2 > h3 > h4 ). For every order of magnitude by
which we decrease h, hp decreases by p orders of magnitude.
Now Suppose that 0 < q < p and that F (h) = L + O(hp ). Show that F (h) = L + O(hq ).
We know that
|hp | ≤ |hq | for q ≤ p, |h| ≤ 1
(67)
And statment that F (h) = L + O(hp ) means that
|F (h) − L| ≤ K |hp | for small h
(68)
|F (h) − L| ≤ K |hp | ≤ K |hq | for |h| ≤ 1
|F (h) − L| ≤ K |hq |
(69)
(70)
F (h) = L + O (hq )
(71)
So we can say
Which translates back into
6.1 Linear Algebra
6.1:10 Singular matrices
Given the linear system

1
 −1
α



−2
−1
α
2 −α  x =  3 
2
1
1
(72)
For what values of α does the system have no solutions and infinitely many solutions?
Finding the values of α for which the determinant vanishes will only tell us if we have infinitely many or no
solutions but will not distinguish between the two cases. Furthermore part c asks us to solve the system for general
MACM 316
Assignment 1 Solutions
α so we’ll go ahead and do that first and see what values of α might give us problems




−2
1
−1
α
1 −1
α −2
 −1
2 −α
3 →
1
0
1 
7  0
2
α
1
1
0 1 + α 1 − α 2(1 + α)
2


−2
1 −1
α
1
0
1 
7→  0
2
0
0 1 − α (1 + α)


−2
1 −1 α
1 0
1 
α 6= ±1
7→  0
1
0
0 1 1−α


α
1 −1 0 −2 − 1−α
1 0
1 
7→  0
1
0
0 1
1−α


α
1 0 0 −1 − 1−α
1 
7→  0 1 0
1
0 0 1
1−α


1
1 0 0 − 1−α
1 
= 0 1 0
1
0 0 1
1−α
(73)
(74)
(75)
(76)
(77)
(78)
So
x1 = −
x2 = 1
x3 =
1
1−α
1
1−α
(79)
(80)
(81)
We can see in the third row of Eq (74) that α = ±1 will cause problems. If α = 1 we have an inconsistent system
(no solutions):


1 −1 1 −2
 0
1 0
1 
(82)
0
0 0
2
If α = −1, we have no constraint on x3 (infinitely many solutions):


1 −1 −1 −2
 0
1
0
1 
0
0
0
0
(83)
Download