Student Notes - Prep Session Topic: Sampling Distributions Content The AP Statistics topic outline contains the following list of items related to sampling distributions. (Items (4), (5), and (8) will not be covered in this session.) 1. 2. 3. 4. 5. 6. 7. 8. Sampling distribution of a sample proportion Sampling distribution of a sample mean Central Limit Theorem Sampling distribution of a difference between two independent sample proportions Sampling distribution of a difference between two independent sample means Simulation of sampling distributions t-distribution Chi-square distribution Sampling distributions are an extension of probability, so many free response questions that include questions on sampling distributions will also include parts that relate to material discussed and reviewed in the earlier prep session on probability. Be sure you understand --1. The difference between a parameter and a statistic 2. What we mean by the sampling distribution of a statistic (that is, the distribution of the values of that statistic obtained from all possible samples of a given size from a given population) 3. What we mean by an unbiased statistic 4. The formulas for p̂ and x should be used only when the population is at least 10 times as large as the sample 5. The sampling distribution of p̂ is approximately normal when the sample size is large (your textbook will have a definition of large, for example np 10 and n(1 p ) 10 ) 6. The sampling distribution of x is normally distributed, regardless of sample size, if the underlying population is normally distributed 7. The sampling distribution of x is approximately normally distributed, regardless of the shape of the underlying population, when the sample size is large (according to the Central Limit Theorem). In this case n 30 is usually sufficiently large. 8. The CLT is a statement about shape. It says that the sampling distribution of sample means becomes more normally distributed as the sample size increases. Formulas You will want to be familiar with the probability formulas that are provided on the exam. A partial list of formulas related to probability on the exam formula sheet is provided here. Note that several relate to the sampling distribution of sample means and sample proportions: Gloria Barrett, Virginia Advanced Study Strategies edited by Daren Starnes 1 January, 2011 If X has a binomial distribution with parameters n and p, then: n P( X k ) p k (1 p) n k k X np X np(1 p) p̂ p p(1 p) n pˆ If x is the mean of a random sample of size n from an infinite population with mean and standard deviation , then: x x n Multiple Choice Questions from 1997 Exam Questions 19 and 20 refer to the following information: Every Thursday, Matt and Dave’s Video Venture has “roll-the-dice” day. A customer may choose to roll two fair dice and rent a second movie for an amount (in cents) equal to the numbers uppermost on the dice, with the larger number first. For example, if the customer rolls a two and a four, a second movie may be rented for $0.42. If a two and a two are rolled, a second movie may be rented for $0.22. Let X represent the amount paid for a second movie on roll-the-dice day. The expected value of X is $0.47 and the standard deviation of X is $0.15. 19. If a customer rolls the dice and rents a second movie every Thursday for 20 consecutive weeks, what is the total amount that the customer would expect to pay for these second movies? (A) $0.45 (B) $0.47 (C) $0.67 (D) $3.00 (E) $9.40 20. If a customer rolls the dice and rents a second movie every Thursday for 30 consecutive weeks, what is the approximate probability that the total amount paid for these second movies will exceed $15.00? (A) 0 (B) 0.09 (C) 0.14 Gloria Barrett, Virginia Advanced Study Strategies (D) 0.86 (E) 0.91 edited by Daren Starnes 2 January, 2011 Multiple Choice Questions from 2002 Exam 18. Which of the following statements is (are) true about the t-distribution with k degrees of freedom? I. The t-distribution is symmetric. II. The t-distribution with k degrees of freedom has a smaller variance than the t-distribution with k + 1 degrees of freedom. III. The t-distribution has a larger variance than the standard normal (z) distribution. (A) I only (B) II only (C) III only (D) I and II (E) I and III 30. The population {2, 3, 5, 7} has mean = 4.25 and standard deviation σ = 1.92. When sampling with replacement, there are 16 different possible ordered samples of size 2 that can be selected from this population. The mean of each of these 16 samples is computed. For example, 1 of the 16 samples is (2, 5), which has a mean of 3.5. The distribution of the 16 sample means has its own mean x and its own standard deviation x . Which of the following statements is true? A) x 4.25 and x 1.92 B) x 4.25 and x 1.92 C) x 4.25 and x 1.92 D) x 4.25 E) x 4.25 Gloria Barrett, Virginia Advanced Study Strategies edited by Daren Starnes 3 January, 2011 36. An urn contains exactly three balls numbered 1, 2, and 3, respectively. Random samples of two X X2 balls are drawn from the urn with replacement. The average, X 1 ,where X1 and X2 are the 2 numbers on the selected balls, is recorded after each drawing. Which of the following describes the sampling distribution of X ? A) B) C) D) E) It cannot be determined from the information given. 38. Suppose that public opinion in a large city is 65 percent in favor of increasing taxes to support the public school system and 35 percent against such an increase. If a random sample of 500 people from this city are interviewed, what is the approximate probability that more than 200 of these people will be against increasing taxes? A) B) C) D) Gloria Barrett, Virginia Advanced Study Strategies E) edited by Daren Starnes 4 January, 2011 AP Exam Free Response Questions for Practice and Discussion 2008, Form B, #2 Four different statistics have been proposed as estimators of a population parameter. To investigate the behavior of these estimators, 500 random samples are selected from a known population and each statistic is calculated for each sample. The true value of the population parameter is 75. The graphs below show the distribution of the values for each statistic. (a) Which of the statistics appear to be unbiased estimators of the population parameter? How can you tell? (b) Which of the statistics A or B would be a better estimator of the population parameter? Explain your choice. (c) Which of the statistics C or D would be a better estimator of the population parameter? Explain your choice. Gloria Barrett, Virginia Advanced Study Strategies edited by Daren Starnes 5 January, 2011 1998, #1 Consider the sampling distribution of a sample mean obtained by random sampling from an infinite population. This population has a distribution that is highly skewed toward the larger values. (a) How is the mean of the sampling distribution related to the mean of the population? (b) How is the standard deviation of the sampling distribution related to the standard deviation of the population? (c) How is the shape of the sampling distribution affected by the sample size? 2004, Form B #3 Trains carry bauxite from a mine in Canada to an aluminum processing plant in northern New York State in hopper cars. Filling equipment is used to load ore into the hopper car. When functioning properly, the actual weights of ore loaded into each car by the filling equipment at the mine are approximately normally distributed with a mean of 70 tons and a standard deviation of 0.9 ton. If the mean is greater than 70 tons, the loading mechanism is overfilling. (a) If the filling equipment is functioning properly, what is the probability that the weight of the ore in a randomly selected car will be 70.7 tons or more? Show your work. (b) Suppose that the weight of ore in a randomly selected car is 70.7 tons. Would that fact make you suspect that the loading mechanism is overfilling the cars? Justify your answer. (c) If the filling equipment is functioning properly, what is the probability that a random sample of 10 cars will have a mean weight of 70.7 tons or more? Show your work. (d) Based on your answer in part (c), if a random sample of 10 cars had a mean ore weight of 70.7 tons, would you suspect that the loading mechanism was overfilling the cars? Justify your answer. Gloria Barrett, Virginia Advanced Study Strategies edited by Daren Starnes 6 January, 2011 Sampling distribution of p̂ problem Imagine a very large candy machine filled with orange, brown, and yellow candies. The company that fills the machine says that 45% of the candies in the machine are orange. Assume for the moment that this claim is true, and that the machine has just been filled. When you insert money, the machine dispenses a random sample of 25 candies. Let p̂ = the proportion of orange candies in your sample. (a) What is the mean of the sampling distribution of the sample proportion p̂ ? Explain. (b) Find the standard deviation of the sampling distribution of p̂ . Show your work. (c) Explain why it would be appropriate to use a Normal distribution to approximate the sampling distribution of p̂ in this setting. (d) Use a Normal distribution to find the approximate probability that the proportion of orange candies in your sample will be less than or equal to 0.36. Show your method clearly. (e) If your sample actually contained 9 orange candies, would that make you suspect that the company isn’t putting enough orange candies in the machine? Justify your answer. Gloria Barrett, Virginia Advanced Study Strategies edited by Daren Starnes 7 January, 2011 2007, #3 Big Town Fisheries recently stocked a new lake in a city park with 2,000 fish of various sizes. The distribution of the lengths of these fish is approximately normal. (a) Big Town Fisheries claims that the mean length of the fish is 8 inches. If this claim is true, which of the following would be more likely? A random sample of 15 fish having a mean length that is greater than 10 inches Or A random sample of 50 fish having a mean length that is greater than 10 inches Justify your answer. (b) Suppose the standard deviation of the sampling distribution of the sample mean for random samples of size 50 is 0.3 inch. If the mean length of the fish is 8 inches, use the normal distribution to compute the probability that a random sample of 50 fish will have a mean length less than 7.5 inches. (c) Suppose the distribution of fish lengths in this lake was nonnormal but had the same mean and standard deviation. Would it still be appropriate to use the normal distribution to compute the probability in (b)? Justify your answer. Gloria Barrett, Virginia Advanced Study Strategies edited by Daren Starnes 8 January, 2011 2007, Form B #2 The graph below shows the relative frequency distribution for X , the total number of dogs and cats owned per household, for the households in a large suburban area. For instance, 14 percent of the households own 2 of those pets. (a) According to local law, each household in this area is prohibited from owning more than 3 of these pets. If a household in this area is selected at random, what is the probability that the selected household will be in violation of this law? Show your work. (b) If 10 households in this area are selected at random, what is the probability that exactly 2 of them will be in violation of this law? Show your work. (c) The mean and standard deviation of X are 1.65 and 1.851 respectively. Suppose that 150 households in this area are to be selected at random and X , the mean number of dogs and cats per household, is to be computed. Describe the sampling distribution of X , including its shape, center, and spread. Gloria Barrett, Virginia Advanced Study Strategies edited by Daren Starnes 9 January, 2011 Solution, 2008 Form B Question 2 (a) Statistics A, C, and D appear to be unbiased. This is indicated by the fact that the mean of the estimated sampling distribution for each of these statistics is about 75, the value of the population parameter. Note: No other characteristic should be mentioned in the response. Students must clearly demonstrate an understanding of the term unbiased. (b) Statistic A would be a better choice because it appears to be unbiased (or centered at 75). Although the variability of the two estimated sampling distributions is similar, statistic A would produce estimates that tend to be closer to the true population parameter value of 75 than would statistic B. (c) Statistic C would be a better choice because it has smaller variability. Although both statistic C and statistic D appear to be unbiased, statistic C would produce estimates that tend to be closer to the true population parameter value of 75 than would statistic D. Solution, 1998 Question 1 (a) The mean of the sampling distribution is equal to the mean of the population. Note: There were a number of papers that had responses containing “the sample mean is close to" or “gets close to the population mean as n increases," or other rewordings of the law of large numbers. These statements, while true, do not answer the question posed. (b) The standard deviation of the sampling distribution is equal to the standard deviation of the population divided by the square root of the sample size. OR Clearly states that the standard deviation of the sampling distribution decreases as n increases. (c) The equivalent of the following two statements must be included: 1. The sampling distribution is skewed for small sample sizes. (A statement that does not use the term skewed but says the distribution will be non-normal is OK.) 2. The shape of the sampling distribution gets more and more normal-like (bell shaped) as the sample size increases. NOTE: Standard symbols were acceptable without explanation, but on this and all free response questions, non-standard symbols must be defined. Gloria Barrett, Virginia Advanced Study Strategies edited by Daren Starnes 10 January, 2011 Solution, 2004 Form B Question 3 Let X = weight of ore in a randomly selected car. (a) P( X 70.7) P( Z 70.7 70 ) P( Z 0.78) 0.2177 0.9 (b) No. Approximately 22% of the cars will have ore weights of 70.7 or greater when the filling equipment is working properly, so a car that was filled with 70.7 tons of ore would not be an unusual occurrence. (c) P( X 70.7) P( Z 70.7 70 0.7 ) P( Z ) P(Z 2.46) 0.0069 0.9 0.285 10 (d) Yes, we would suspect that the filling mechanism is overfilling. If it is working properly, the probability that the mean weight of the ore in 10 randomly selected cars is 70.7 or greater is 0.0069 which is very small. Note 1: To receive complete credit for part (a) or part (c), students must show how the probability is computed. Since part (a) and part (c) involve different normal distributions, it is important to identify which normal distribution is used in each part. As shown above, this could be done by displaying a probability statement containing the mean and standard deviation for the appropriate normal distribution. It could be done in other ways, such as listing the mean and standard deviation and displaying an appropriate graph. Note 2: The response in part (b) could be justified by indicating that 70.7 tons is less than one standard deviation away from the desired mean of 70 tons. The response in part (d) could be justified by indicating that 70.7 tons is more than two standard deviations above the desired mean of 70 tons. Solution, 2007 Form B Question 2 (a) P( X 3) 0.07 0.04 0.04 0.02 0.17 (b) Y = number of households in violation. Y has a binomial distribution with n = 10 and p = 0.17. 10 P(Y 2) (0.17) 2 (0.83)8 0.2929 2 (c) The distribution of X will: 1. be approximately normal (note that the word approximately is required for an essentially correct response) OR is more symmetric than the population distribution which is highly skewed. 2. have mean X 1.65 3. have standard deviation X Gloria Barrett, Virginia Advanced Study Strategies n 1.851 0.1511 150 edited by Daren Starnes 11 January, 2011 Solution, 2007 Question 3 (a) The random sample of n = 15 fish is more likely to have a sample mean length greater than 10 inches. The sampling distribution of the sample mean x is normal with mean 8 and standard deviation n . Thus, both sampling distributions will be centered at 8 inches, but the sampling distribution of the sample means when n = 15 will have more variability than the sampling distribution of sample means when n = 50. The tail area x 10 will be larger for the distribution that is less concentrated about the mean of 8 inches when the sample size is n = 15, as shown in the following graph. (b) Gloria Barrett, Virginia Advanced Study Strategies edited by Daren Starnes 12 January, 2011 (c) Yes. The Central Limit Theorem says that the sampling distribution of the sample mean will become approximately normal as the sample size increases. Since the sample size is reasonably large (n = 50), the calculation in part (b) will provide a good approximation to the probability of interest even though the population is nonnormal. Solution to proportion problem (a) The sample proportion p̂ is an unbiased estimator of the population proportion p, and so the mean of the sampling distribution of p̂ will be 0.45. (b) As long as the “large” candy machine contains at least 250 candies, we should be safe using the formula for the standard deviation of the sampling distribution of p̂ , even though we are sampling without replacement from a finite population: pˆ p(1 p) 0.45(0.55) 0.0995 n 25 (c) Both np = 25(0.45) = 11.25 and n(1 – p) = 25(0.55) = 13.75 are at least 10, so Normal approximation is appropriate. (d) A sample proportion of pˆ 0.36 corresponds to a z-score of z 0.36 0.45 0.90 0.0995 P pˆ 0.36 P( z 0.90) 0.1841 . (e) About 1 in every 5 times, you would get 9 or fewer orange candies just by chance if the company’s claim is true. This isn’t strong enough evidence to convince you that the company is putting too few orange candies in the machine. Gloria Barrett, Virginia Advanced Study Strategies edited by Daren Starnes 13 January, 2011