251solnN1 4/8/08 Open this document in 'Page Layout' view!) N. Statistical Sampling. 1. Definitions. 2. Distribution of x and p Text 7.11, 7.14 [7.12, 7.15] (7.12, 7.15) 3. The Central Limit Theorem Text 7.1, 7.5, 7.67 on CD, 7.68, 7.71 [7.1, 7.5, 7.41, 7.42, 7.45] (7.1, 7.5, 7.40, 7.41, 7.44.) N1, N2. -----------------------------------------------------------------------------------------------------------------------------------------------------------------. Problems about the sample proportion In the outline section L, we learned that if p is a sample proportion, E p p , Var p pq . In n section M, we learned that the Binomial distribution can be approximated by the Normal distribution, so pq x that p ~ N p, where 0 p 1 , x is the observed number of successes in a sample of n and n n . 0 q 1 p 1 Exercise 7.11 [7.12 in 9th]: Out of a sample of 64 people, 64 are classified as successful. We need to find a) the sample proportion and b) the standard error of the sample proportion. x 48 .75 . Solution: This problem gives x 48 out of n 64 and that p .7 . a) p n 64 b) p pq n .7 .3 64 .00328125 .0573 Remember that E p p .70 . Exercise 7.14 [7.15 in 9th]: A pollster will forecast your candidate as the winner of an election if your candidate receives at least 55% of the vote in a sample. Find the probability of this occurring under the following circumstances: a) n 100, p 50.1% , b) n 100, p 60% , c) n 100, p 49% . Repeat a) – c) with n 400 . Discuss. pq This Problem asks for P p .55 when p has various values and n 100 . Because p ~ N p, , n p p p p z p pq n a) p .501 b) p .60 c) p .49 .55 .501 .55 .501 Pz 0.98 .5 .3365 .1635 P p .55 P z .501 .499 P z .00250 100 .55 .60 .55 .60 Pz 1.02 .5 .3461 .8461 P p .55 P z .60 .40 P z .00240 100 .55 .49 .55 .49 Pz 1.20 .5 .3849 .1151 P p .55 P z .49 .51 P z .002499 100 1 251solnN1 4/8/08 Open this document in 'Page Layout' view!) d) If n 400 , the Instructor’s Solutions Manual provides the answers below. Increasing the sample size by a factor of 4 decreases the standard error by a factor of 2. (a) P(ps > 0.55) = P (Z > 1.96) = 0.0250 (b) P(ps > 0.55) = P (Z > – 2.04) = 0.9793 (c) P(ps > 0.55) = P (Z > 2.40) = 0.0082 Problems about the Sample Mean or x Exercise 7.1: If x ~ N 100 ,10 and n 25 , find the probability that x is: a) less than 95, b) between 95 and 97.5, c) above 102.2. d) Show a number x.65 that has 65% above it. Solution: x 10 10 2 x 100 4 2 , so x ~ N 100 ,2 and z . 25 2 n 25 Make diagrams! 95 100 a) Px 95 P z Pz 2.50 .5 .4938 .0062 2 97 .5 100 95 100 z b) P95 x 97 .5 P P 2.50 z 1.25 2 2 P2.50 z 0 P1.25 z 0 .4938 .3944 .0994 102 .2 100 c) Px 102 .2 P z Pz 1.10 .5 .3643 .1357 2 101 100 99 100 z d) (In 9th edition only) P99 x 101 P P 0.50 z 0.50 2 2 P0.50 z 0 P0 z 0.50 .1915 .1915 .3830 d) in 10th edition e) in 9th edition. We want the point x.65 defined by Px x.65 .65 . Make a diagram for z , showing zero in the middle, 100% - 65% = 35% above z .65 (which is below zero), 50% above zero and 50% - 65% = 15% between zero and z .65 . Note that, because 35% is below it, z.65 z.35 . So we check the table to find the value of z..35 , defined by P0 z z ..35 .15 . The closest that we can come on the Normal table is P0 z 0.39 .1517 . So z ..35 0.39 . This means z.65 0.39 and x.65 z.65 100 0.392 99.22. Note that the bottom of the t table gives a more accurate value of z..35 0.385 . f) in 9th edition. Redo the problem with n 16 . Remember that for a continuous distribution, “>” and "" are essentially the same. The Instructor’s Solutions Manual provides the answers below. x 10 10 2 6.25 2.50 . So x ~ N 100 ,2.5 16 n 16 Px 95 Pz 2.00 .0228 (a) (b) (c) (d) (e) P95 x 97.5 P2.00 z 1.00 .1359 Px 102 .2 Pz 0.88 .1894 P99 x 101 P0.40 z 0.40 .3108 x.65 z .65 x 100 0.392.50 99.025 2 251solnN1 4/8/08 Open this document in 'Page Layout' view!) Exercise 7.5: The diameter of Ping-Pong balls manufactured in a large factory is expected to be approximately normally distributed with a mean of 1.30 inches and a standard deviation of 0.04 inch. In the 9th edition the author asks what the probability is that a randomly selected Ping-Pong ball will have a diameter a) less than 1.28 inches. b) What is the probability that the diameter is between 1.31 and 1.33 inches? c) Between what two values (symmetrically distributed around the mean) will 60% of the balls fall? Now the author asks parts d) – h), which are parts a) – d) in the 10th edition. However, it is important to compare the answers to the questions about probabilities for a sample of 1 as above with the answers in d) – h) below. If many random samples of 16 balls are selected d) what will be the values of the population mean and standard error of the mean? e) What distribution will the sample means follow? f) What proportion of the sample means (or for an individual sample, what is the probability that the sample mean) will be less than 1.28 inches? g) What proportion of the sample means will be between 1.31 and 1.33 inches? h) Between what two values (symmetrically distributed around the mean) will 60% of the sample means be? So now in the original, better version of this problem the author asks you to i) compare the answers of (a) with (f) and (b) with (g). Discuss. j) Explain the differences between the results in (c) and (h). k) Which is more likely to occur – an individual ball above 1.34 inches, a sample mean above 1.32 inches in a sample of size 4, or a sample mean above 1.31 inches in a sample of size 16? Explain. Note the follow-up below. Solution: If we have x ~ N 1.30,0.04 and a) – c) are absolutely straightforward Normal distribution problems. 1.28 1.30 Pz 0.50 .5 .1915 .3085 (a) Px 1.28 P z 0.04 (b) P(1.31 < x < 1.33) = P(0.25 < z < 0.75) = 0.2734 – 0.0987 = 0.1747 (c) A symmetrical region around the mean with 60% probability. Make a diagram for z , showing zero in the middle. The area we want can be split in two by zero, so that P0 z z .20 .30 . If we look for a probability of .30 on the Normal table, the closest we can come is P0 0.84 .2995 . Our two values of z are z.20 0.84 , and these can be made values of x by using x z.20 x 1.30 0.840.04 1.30 .0336 or 1.2664 to 1.3336, and we can show that P(1.27 < x < 1.34) .60. If we use the bottom of the t table, z .20 0.842 , a more accurate value. 0.04 2 0.0001 0.01 . 16 n 16 (e) Because the population diameter of Ping-Pong balls is approximately normally distributed, the sampling distribution of samples of 16 will also be approximately normally distributed. x ~ N 1.30,0.01 . (f) P( x < 1.28) = P( z < -2.00) = .5 - 0.4772 = 0.0228 (d) If n 16 , x 1.30 . So x . 0.04 (g) P(1.31 < x < 1.33) = P(1.00 < z < 3.00) = .4987 – .3413 = 0.1574 (h) A symmetrical region around the mean with 60% probability. We already know that z .20 0.842 . Our interval will be x z.20 x 1.30 0.842 0.01 1.30 .0084 or 1.2916 to 1.3084. (i) When samples of size 16 are taken rather than individual values (samples of n = 1), more values lie closer to the mean and fewer values lie farther away from the mean with the increased sample size. This occurs because the standard deviation of the sampling distribution, the standard error, is given by x . As n increases, the value of the n denominator increases, resulting in a smaller value of the overall fraction. (j) The standard error for the distribution of sample means of size 16 is 1/4 of the population standard deviation of individual values and means that the sampling distribution is more concentrated around the population mean. 3 251solnN1 4/8/08 Open this document in 'Page Layout' view!) (k) They are equally likely to occur (probability = 0.1587) since as n increases, more sample means will be closer to the mean of the distribution. Follow up: From what you have learned previously if a random sample of 16 balls is selected, l) what is the probability that l) all 16 balls will have a diameter less than 1.28 inches? m) What is the probability that at least one ball will be between 1.31 and 1.33 inches? Solution: These are binomial problems. (l) In a) we found Px 1.28 .3085 . If y is the number of balls out of 16 with diameters below 1.28 inches, it has the binomial distribution with n 16 , p .3085 and q 1 p .6915. So 16 16 0 P y 16 C16 p q .3085 16 6.73104 10 9 .0000000067 . (m) In b) we found P(1.31 < x < 1.33) = 0.1747. If y is the number of balls out of 16 with diameters between 1.31 and 1.33 inches, it has the binomial distribution with n 16 , p .1747 and q 1 p .8253. So P y 1 1 P y 0 1 C 016 p 0 q16 1 .8253 16 1 .0467 .9533 . This is a great variation on the previous problem for a final exam. Exercise 7.67 on CD [7.41 in 9th] (7.40 in 8th): Given that N 80 and n 10 and the sample is obtained without replacement, determine the finite population correction factor. Solution: If we have N 80 and n 10, the finite population correction factor is N n 80 10 N 1 80 1 70 .9413 . 79 Exercise 7.68 on CD [7.42 in 9th] (7.41 in 8th ): Which of the following finite population factors will have a greater effect in reducing the standard error – one based on a sample of size 100 selected without replacement from a population of size 400 or one based on a sample of size 400 selected without replacement from a population of size 900? Explain. Solution: For N 400 and n 100 , N n 900 200 N 1 900 1 part of the population. n 200 , N n N 1 400 100 300 .8671 . For N 900 and 400 1 399 700 .8824 . The first is more effective because the sample is a larger 899 Exercise 7.71 on CD [7.45 in 9th] (7.44 in 8th): The amount of time a bank teller spends with each customer has a population mean 3.10 minutes and a standard deviation 0.40 minute. If a random sample of 16 customers is selected without replacement from a population of 500 customers, a) what is the probability that the average time spent per customer is less than 3 minutes? b). There is an 85% chance that the sample mean will be below how many minutes? Solution: x ~ N 3.10,0.40 . N 500 , .40 2 484 N n 0.40 500 16 0.0096994 .0985 Note that since the 16 499 n N 1 16 500 1 sample size is less than 5% of the population, the finite population correction has almost no effect. This still shows how to use the correction even if it is not needed. x ~ N 3.10,0.0985 n 16 . x x 3 3.10 a) Px 3 P z Pz 1.02 Pz 0 P 1.02 z 0 .5 .3438 .1562 . .0985 b) We want the 85th percentile, z..15 , a point with 15% above it and 85% below it. Make a diagram and show that P0 z z.15 .3500 . If we use the Normal table, the closest we can come is P0 z 1.04 .3508 . So z.15 . 1.04. If we use the t table we get z .15 1.036 . Finally x.15 z.15 3.10 1.036 .0985 3.202 . 4 251solnN1 4/8/08 Open this document in 'Page Layout' view!) 3.202 3.10 You should be able to show that Px 3.202 P z Pz 1.04 0.0985 .5 .3508 .8508 85%. Problem N1: The average life of a Toyota Caramba automobile is 44 months with a standard deviation of 18 months. a) From a sample of 36, what is the probability that we find an average life below 38 months? Solution: x ~ N 44, 18 , n 36 Remember! The sample mean is Normally distributed if the parent distribution is Normal, and approximately normally distributed for large n . The mean is E x , and X X N n n n N 1 if the sample is more than 5% of the population. In this part of the problem we can assume that it is 18 18 3 . So x ~ N 44, 3 and Make a approximately true that x ~ N , and x n 6 n 36 diagram for this one - you can reuse it for b) the standard deviation is x if the sample is less than 5% of the population or x b) Actually only 200 Toyota Carambas were ever produced. Redo part a) continuing to assume a sample of 36. Solution: x ~ N 44, 18 , n 36 , but this time N 200 and since n is more than 5% on the population size, we must use the finite population correction factor. Thus x n N n 18 N 1 36 200 36 200 1 38 44 2.72343 and x ~ N 44, 2.72343 . So Px 38 P z Pz 2.20 2 .72343 Pz 0 P2.20 z 0 .5 .4861 .0139 . c) An average package weighs 44 lbs. with a standard deviation of 18 lbs. A Toyota Caramba will carry 1368 lbs. If I must deliver 36 packages, what is the chance that my vehicle will not be overloaded? Once x . We want P x 1368 . But if again x ~ N 44, 18 , but this is a birthday party problem about we take 36 packages, this is the same as saying that the mean is less than for Px 38 , and the answer is the same as to part a) 1368 38 . So we are looking 36 Problem N2: For my national fleet (all the same vehicle), mean weekly gasoline consumption is 16.9 gallons with a standard deviation of 3.2 gallons. In a local garage I have 875 gallons of gas and 50 vehicles. What is the probability of running out of gas this week? Solution: If we have 50 vehicles, the mean gas consumption is approximately Normally distributed. This is x is less than 875. But for 50 vehicles a birthday party problem; we want to know the probability that 875 17 .5 . For x , the standard deviation this is the same as saving that the average gas use is less than 50 3.2 0.4525 . So (called the standard error of the mean) is x n 50 P 16 .9 x 875 Px 17.5 P z 170.5.4525 Pz 1.33 Pz 0 P0 z 1.33 .5 .4082 .0913 5