9.2 SAMPLE PROPORTIONS OVERVIEW: The normal distribution curve is often extremely useful in analyzing sample proportions. This section provides insights into the circumstances that allow for use of normal distribution properties. Consider a simple random sample (SRS) of 1,000 people from a large population. If X represents the number in this sample who are Republicans, then there are 1,001 possible values of X, namely 0,1,2,3, ..., 998, 999, 1000. If p(hat) represents the possible sample proportions of Republicans in the sample, then there are 1,001 possible values of p(hat), namely 0/1000, 1/1000, 2/1000, ..., 998/1000, 999/1000, 1000/1000. For a given sample, we might find p(hat) = .56. For another sample, we might find p(hat) = .52. We could choose many SRS's and calculate a p(hat) for each sample. In general, we would expect the distribution of p(hat) to be approximately normal. If we choose an SRS of size n from a large population with population proportion p having some characteristic of interest, and if p(hat) is the proportion of the sample having that characteristic, then The sampling distribution of p(hat) is approximately normal. The mean of the sampling distribution is p (the population parameter). The standard deviation of the sampling distribution is sqrt[p(1-p)/n]. It is reasonable to use the above statements when -the population is at least 10 times as large as the sample (Rule of Thumb 1 ). -np is at least 10 and n(1-p) is at least 10. (Rule of Thumb 2 ). Example: Suppose it is known that 60% of the registered voters in a district of over 20,000 people are Republicans. If you choose an SRS of 1000 registered voters, (a) what is the probability that the proportion of registered voters in the sample is between 58% and 62%? (b) what is the probability that the sample will contain no more than 550 Republicans? First, note that both thumb rules are satisfied. The sample proportion p(hat) has mean = .6 and standard deviation = sqrt[(.6)(.4)/1000] = .0155. Response to (a): Using the TI-83, normalcdf( = .8031, or 80.31% normalcdf(.58,.62,.60,.0155) cdf( Response to (b): 550/1000 = 0.55. the probability of a sample proportion containing at most 55% Republicans is normalcdf(-1E99,.55, normalcdf( .60,.0155) = 0.000628, or about 0.0628%. Things we might note: z .55 = (.55 - .60)/.0155 = -3.225. That is, a proportion of .55 is more than 3 standard deviations below the mean. This represents a rather rare score in a normal distribution N(.60, .0155). Also, 0.000628 is approximately 1/1592. In other words, if we had around 1,600 random samples of size 1000, we would "expect" only one of them to have 550 of fewer Republicans.