Supplement for Section 12.4 on Normal Distribution The first part of Section 12.4 deals with any normally distributed random variable X. It relates an interval to the probability that X is in that interval. In the special case where the center of the interval is the mean for X, the interval is the “confidence interval,” the probability is the “level of confidence,” and the distance from the center to the ends of the confidence interval is the “margin of error.” The level of confidence is determined by the margin of error, specifically by how many standard deviations the end points of the interval are from the center. X The number of standard deviations is calculated by: Z , where X is an end point, is the mean, and is one standard deviation. Z is positive if X is the upper end point and negative if X is the lower one. The margin of error is calculated by M .E. Z . Two typical types of problems are illustrated by the following examples. For these examples, assume that X has a mean 20 and a standard deviation 2. Example 1. Find P(19 < X < 21). The upper end point is Z 21220 0.5 standard deviations above the mean and the lower end point is Z 20219 0.5 or 0.5 standard deviations below the mean. Thus, P(19 < X < 21) = P(-.5 < Z < .5) = P(Z < .5) – P(Z < -.5) = .6915 - .3085 = .383. That is, the probability that the value of the random variable X is between 19 and 21 is .383. Example 2. Find P(17 < X < 23). The upper end point is Z 23220 1.5 standard deviations above the mean, and the lower end point is 1.5 standard deviations below the mean. Thus, P(17 < X < 23) = P(-1.5 < Z < 1.5) = P(Z < 1.5) – P(Z < -1.5) = .9332 - .0668 = .8664. That is, the probability that the value of the random variable X is between 17 and 23 is .8664. Note that for the second example, the margin of error is 3 times as large, because Z is 3 times as large, so the level of confidence is also larger. The bigger the target, the more likely you are to hit it. Example 3. Find a 95% confidence interval for X. The probability that X is outside the interval is .05, so the probability that X is less than the upper end point of the interval is .975 (and similarly the probability that it is less than the lower end point is .025). From the table, the upper end point must be approximately 1.96 standard deviations above the mean, i.e., X 20 (1.96 2) 23.92 . Similarly, the lower end point is X 20 (1.96 2) 16.08, and the 95% confidence interval is the interval (16.08, 23.92). Verifying, P(16.08 < X < 23.92) = P(-1.96 < Z < 1.96) = P(Z<1.96) – P(Z< -1.96) = .975 - .025 = .95. Note: If you approximate 1.96 by 2, which we will do in later examples, you get the interval (16, 24). Example 4. Find a 92% confidence interval for X. The probability that X is less than the upper end point is .96, so by the table that point is 1.75 standard deviations above the mean, i.e. X 20 (1.75 2) 23.5 ; and the lower end point is X 20 (1.75 2) 16.5 . Thus, the interval is (16.5, 23.5). Note: Example 4 has a smaller margin of error than Example 3 since we don’t seek as high a level of confidence. The most important application of these concepts deals with sampling. Assume p is the proportion that share some property from a population of size N , for example are female or have some opinion. You choose a sample of size n, and p̂ is the corresponding proportion of the sample that shares the property. For example, from 600,000 women and 400,000 men, you choose 620 women and 380 men. If we are considering women in the sample, N = 1,000,000, p = .6, n = 1,000, and p̂ = .62. If from the same population, I choose 590 women and 410 men, then p = .6 and p̂ = .59. p is an example of a population parameter and p̂ is a sample statistic or sample proportion. We expect a parameter and a statistic to be “close” to one another “most” of the time. There are 2 reasons that they may differ: bias and variability. In theory, we eliminate bias by choosing random samples. However, we can not eliminate variability. You see the effects of variability whenever you flip fair coins and do not get exactly half heads. We assume that we choose random samples all of size n from a population of size N. We assume that N is much larger than n; otherwise we could just take a census of the whole population. There is only one value for p, the proportion of the population. However, the sample proportions p̂ will vary from sample to sample. We can measure “close” by margin of error and “most” by level of confidence based on the following 3 facts: 1. The values of p̂ are approximately normal. Thus, p̂ can be used in place of X. 2. The mean for p̂ is p. Values of p̂ that are larger than p are balanced by values that are smaller. 3. The standard deviation of p̂ is p (1 p ) n . In cases where you don’t know p, can be approximated by the standard error which is In any case, 1 2 n pˆ (1 pˆ ) n . is a good approximation for , and we will use that. Note that larger n results in smaller and so less variability. Thus, p and p̂ are more likely to be close together if the sample size is large. Note also that N, the population size, does not come into the equation. The taste of the soup that you get depends on the size of the spoon, not the size of the pot. In the next 3 examples, assume you choose a random sample of size n= 400 from a population of 600,000 females and 400,000 males and p̂ is the proportion of females in the sample. Example 5. Find P(.58 < p̂ < .62). .6 .6 p .6 and 2 1400 401 .025 ; so for p̂ = .62, Z .62.025 .8; for p̂ = .58, Z .58 .025 .8. Thus P(.58 < p̂ < .62) = P(-.8 < Z < .8) = P(Z < .8) – P(Z < -.8) = .7881 - .2119 = .5762. Example 6. Find P(.56 < p̂ < .64). As in Example 5, for p̂ =.64, Z .6 Z .56 .025 1.6 . Thus P(.56 < p̂ < .64) = P(-1.6 < Z < 1.6) = .9452 - .0548 = .8904. .64.6 .025 1.6 , and for pˆ .56, Example 7. Find a 95% confidence interval for p̂ . As in example 3, p̂ = + 2 .6 2 0.025 .65 and pˆ 2 .6 2 0.025 .55. Thus (.55, .65) is the 95% confidence interval. Example 8. Find a 92% confidence interval for p̂ . As in example 4, pˆ .6 1.75 0.025 .644 , pˆ .6 1.75 0.025 .555. Thus (.556, .644) is the 92% confidence interval. Example 9. Repeat Example 5, except assume sample size of n=900. .6 .58.6 Now 2 1900 601 .0167 ; so Z ..62 0167 1.2 and Z .0167 1.2 . Thus P(.58 < p̂ < .62) = P(-1.2 < Z < 1.2) = .8849 - .1151 = .7698. Note that the larger sample produces a larger level of confidence. For each of the last 5 examples, we assumed that we knew the value of p and found confidence levels for p̂ . A case in which we might wish to perform such an analysis would be one in which a census had determined the proportion p of females in a population. If someone had chosen a sample for some purpose (such as hiring) for which the proportion was significantly different, we could compute how small the probability was that such a difference occurred randomly. Such evidence could be used in a discrimination case. Much more common is the case where we have a specific value for p̂ and we wish to use it to estimate p. In this case, we carry out the same calculations, we just reverse the roles of p and p̂ . If you knew that someone shot at a target, you didn’t know where the target was, but you knew where the shot hit, you would expect that the target was in the vicinity of where the shot hit. In the next 2 examples, assume from a population of 1,000,000 voters, you choose a sample of 540 yes voters and 360 no voters and p is the proportion of yes voters in the population. Example 10. Find P(.56 < p < .64). .6 .56.6 ; 2 1900 601 .0167 ; Z ..64 ; pˆ 540 900 .6 0167 2.4 Z .0167 2.4 Thus P(.56 < p < .64) = P(-2.4 < Z < 2.4) = .9918 - .0082 = .9836, or a 98% confidence interval. Example 11. Find a 92% confidence interval for p. As in Example 8, p = .6 + 1.75*.0167 = .629 and p = .6 – 1.75*.0167 = .571. Thus (.571, .629). In all of the previous examples, we were given a sample size. We could then choose the margin of error and calculate the level of confidence or vice versa. We could not prescribe both. If we want both a high level of confidence and a low margin of error, we must choose a sufficiently large sample. Example 12. How large a sample must one choose in order to have a 3% margin of error with 95% confidence? In order to have 95% confidence, the margin of error must be 2 . We can use 2 1 n for . Thus .03 2 2 1 n , from which it follows that n 1 .03.03 1111. This is the sample size for most poll results that you see reported. Exercises In exercises 1 and 2, assume the length of fish is normally distributed with a mean of 30 cm and a standard deviation of 4 cm. 1. Find the probability that a fish is between 23.2 cm and 36.8 cm. 2. Find a 98% confidence interval for the length of fish. In exercises 3, 4, 5, and 6, assume you choose a random sample of 900 from 600,000 females and 400,000 males. 3. What proportion of the population is female? 4. Find the probability that the proportion of females in the sample is between .56 and .64. 5. Find a 95% confidence interval for the proportion of females in the sample. 6. Find a 98% confidence interval for the proportion of females in the sample. In exercises 7, 8, and 9, assume you choose a random sample of 540 yes voters and 360 no voters from a population of 900,000 voters. 7. What proportion of the sample is yes voters? 8. Find the probability that the proportion of yes voters in the population is between .56 and .64. 9. Find a 98% confidence interval for the proportion of yes voters in the population. 10. How large a sample must you choose in order to have a 2% margin of error with a 98% level of confidence?