Student Notes - Prep Session Topic: Sampling Distributions

advertisement
Student Notes - Prep Session Topic: Sampling Distributions
Content
The AP Statistics topic outline contains the following list of items related to sampling distributions. (Items
(4), (5), and (8) will not be covered in this session.)
1.
2.
3.
4.
5.
6.
7.
8.
Sampling distribution of a sample proportion
Sampling distribution of a sample mean
Central Limit Theorem
Sampling distribution of a difference between two independent sample proportions
Sampling distribution of a difference between two independent sample means
Simulation of sampling distributions
t-distribution
Chi-square distribution
Sampling distributions are an extension of probability, so many free response questions that include
questions on sampling distributions will also include parts that relate to material discussed and reviewed in
the earlier prep session on probability.
Be sure you understand --1. The difference between a parameter and a statistic
2. What we mean by the sampling distribution of a statistic (that is, the distribution of the values of
that statistic obtained from all possible samples of a given size from a given population)
3. What we mean by an unbiased statistic
4. The formulas for  p̂ and  x should be used only when the population is at least 10 times as large as
the sample
5. The sampling distribution of p̂ is approximately normal when the sample size is large (your textbook
will have a definition of large, for example np  10 and n(1  p )  10 )
6. The sampling distribution of x is normally distributed, regardless of sample size, if the underlying
population is normally distributed
7. The sampling distribution of x is approximately normally distributed, regardless of the shape of the
underlying population, when the sample size is large (according to the Central Limit Theorem). In
this case n  30 is usually sufficiently large.
8. The CLT is a statement about shape. It says that the sampling distribution of sample means
becomes more normally distributed as the sample size increases.
Formulas
You will want to be familiar with the probability formulas that are provided on the exam. A partial list of
formulas related to probability on the exam formula sheet is provided here. Note that several relate to the
sampling distribution of sample means and sample proportions:
Gloria Barrett, Virginia Advanced Study Strategies
edited by Daren Starnes
1
January, 2011
If X has a binomial distribution with parameters n and p, then:
n
P( X  k )    p k (1  p) n k
k 
 X  np
 X  np(1  p)
 p̂  p
p(1  p)
n
 pˆ 
If x is the mean of a random sample of size n from an infinite population with mean  and
standard deviation  , then:
x  
x 

n
Multiple Choice Questions from 1997 Exam
Questions 19 and 20 refer to the following information:
Every Thursday, Matt and Dave’s Video Venture has “roll-the-dice” day. A customer may choose to roll two
fair dice and rent a second movie for an amount (in cents) equal to the numbers uppermost on the dice, with
the larger number first. For example, if the customer rolls a two and a four, a second movie may be rented for
$0.42. If a two and a two are rolled, a second movie may be rented for $0.22. Let X represent the amount
paid for a second movie on roll-the-dice day. The expected value of X is $0.47 and the standard deviation of
X is $0.15.
19. If a customer rolls the dice and rents a second movie every Thursday for 20 consecutive weeks, what is
the total amount that the customer would expect to pay for these second movies?
(A) $0.45
(B) $0.47
(C) $0.67
(D) $3.00
(E) $9.40
20. If a customer rolls the dice and rents a second movie every Thursday for 30 consecutive weeks, what is
the approximate probability that the total amount paid for these second movies will exceed $15.00?
(A) 0
(B) 0.09
(C) 0.14
Gloria Barrett, Virginia Advanced Study Strategies
(D) 0.86
(E) 0.91
edited by Daren Starnes
2
January, 2011
Multiple Choice Questions from 2002 Exam
18. Which of the following statements is (are) true about the t-distribution with k degrees of
freedom?
I. The t-distribution is symmetric.
II. The t-distribution with k degrees of freedom has a smaller variance than the t-distribution
with k + 1 degrees of freedom.
III. The t-distribution has a larger variance than the standard normal (z) distribution.
(A) I only
(B) II only
(C) III only
(D) I and II
(E) I and III
30. The population {2, 3, 5, 7} has mean  = 4.25 and standard deviation σ = 1.92. When sampling with
replacement, there are 16 different possible ordered samples of size 2 that can be selected from this
population. The mean of each of these 16 samples is computed. For example, 1 of the 16 samples is (2, 5),
which has a mean of 3.5. The distribution of the 16 sample means has its own mean  x and its own standard
deviation  x . Which of the following statements is true?
A)  x  4.25 and  x  1.92
B)  x  4.25 and  x  1.92
C)  x  4.25 and  x  1.92
D)  x  4.25
E)  x  4.25
Gloria Barrett, Virginia Advanced Study Strategies
edited by Daren Starnes
3
January, 2011
36. An urn contains exactly three balls numbered 1, 2, and 3, respectively. Random samples of two
X  X2
balls are drawn from the urn with replacement. The average, X  1
,where X1 and X2 are the
2
numbers on the selected balls, is recorded after each drawing. Which of the following describes the
sampling distribution of X ?
A)
B)
C)
D)
E) It cannot be determined from the information given.
38. Suppose that public opinion in a large city is 65 percent in favor of increasing taxes to support the public
school system and 35 percent against such an increase. If a random sample of 500 people from this city are
interviewed, what is the approximate probability that more than 200 of these people will be against increasing
taxes?
A)
B)
C)
D)
Gloria Barrett, Virginia Advanced Study Strategies
E)
edited by Daren Starnes
4
January, 2011
AP Exam Free Response Questions for Practice and Discussion
2008, Form B, #2
Four different statistics have been proposed as estimators of a population parameter. To investigate the
behavior of these estimators, 500 random samples are selected from a known population and each statistic
is calculated for each sample. The true value of the population parameter is 75. The graphs below show the
distribution of the values for each statistic.
(a) Which of the statistics appear to be unbiased estimators of the population parameter?
How can you tell?
(b) Which of the statistics A or B would be a better estimator of the population parameter?
Explain your choice.
(c) Which of the statistics C or D would be a better estimator of the population parameter?
Explain your choice.
Gloria Barrett, Virginia Advanced Study Strategies
edited by Daren Starnes
5
January, 2011
1998, #1
Consider the sampling distribution of a sample mean obtained by random sampling from an infinite
population. This population has a distribution that is highly skewed toward the larger values.
(a) How is the mean of the sampling distribution related to the mean of the population?
(b) How is the standard deviation of the sampling distribution related to the standard deviation of the
population?
(c) How is the shape of the sampling distribution affected by the sample size?
2004, Form B #3
Trains carry bauxite from a mine in Canada to an aluminum processing plant in northern New York
State in hopper cars. Filling equipment is used to load ore into the hopper car. When functioning properly,
the actual weights of ore loaded into each car by the filling equipment at the mine are approximately
normally distributed with a mean of 70 tons and a standard deviation of 0.9 ton. If the mean is greater than
70 tons, the loading mechanism is overfilling.
(a) If the filling equipment is functioning properly, what is the probability that the weight of the ore in a
randomly selected car will be 70.7 tons or more? Show your work.
(b) Suppose that the weight of ore in a randomly selected car is 70.7 tons. Would that fact make you suspect
that the loading mechanism is overfilling the cars? Justify your answer.
(c) If the filling equipment is functioning properly, what is the probability that a random sample of 10 cars
will have a mean weight of 70.7 tons or more? Show your work.
(d) Based on your answer in part (c), if a random sample of 10 cars had a mean ore weight of 70.7 tons,
would you suspect that the loading mechanism was overfilling the cars? Justify your answer.
Gloria Barrett, Virginia Advanced Study Strategies
edited by Daren Starnes
6
January, 2011
Sampling distribution of p̂ problem
Imagine a very large candy machine filled with orange, brown, and yellow candies. The company that fills
the machine says that 45% of the candies in the machine are orange. Assume for the moment that this
claim is true, and that the machine has just been filled. When you insert money, the machine dispenses a
random sample of 25 candies. Let p̂ = the proportion of orange candies in your sample.
(a) What is the mean of the sampling distribution of the sample proportion p̂ ? Explain.
(b) Find the standard deviation of the sampling distribution of p̂ . Show your work.
(c) Explain why it would be appropriate to use a Normal distribution to approximate the sampling
distribution of p̂ in this setting.
(d) Use a Normal distribution to find the approximate probability that the proportion of orange candies in
your sample will be less than or equal to 0.36. Show your method clearly.
(e) If your sample actually contained 9 orange candies, would that make you suspect that the company isn’t
putting enough orange candies in the machine? Justify your answer.
Gloria Barrett, Virginia Advanced Study Strategies
edited by Daren Starnes
7
January, 2011
2007, #3
Big Town Fisheries recently stocked a new lake in a city park with 2,000 fish of various sizes. The distribution
of the lengths of these fish is approximately normal.
(a) Big Town Fisheries claims that the mean length of the fish is 8 inches. If this claim is true, which of the
following would be more likely?

A random sample of 15 fish having a mean length that is greater than 10 inches
Or

A random sample of 50 fish having a mean length that is greater than 10 inches
Justify your answer.
(b) Suppose the standard deviation of the sampling distribution of the sample mean for random samples of
size 50 is 0.3 inch. If the mean length of the fish is 8 inches, use the normal distribution to compute the
probability that a random sample of 50 fish will have a mean length less than 7.5 inches.
(c) Suppose the distribution of fish lengths in this lake was nonnormal but had the same mean and standard
deviation. Would it still be appropriate to use the normal distribution to compute the probability in (b)?
Justify your answer.
Gloria Barrett, Virginia Advanced Study Strategies
edited by Daren Starnes
8
January, 2011
2007, Form B #2
The graph below shows the relative frequency distribution for X , the total number of dogs and cats owned
per household, for the households in a large suburban area. For instance, 14 percent of the households own
2 of those pets.
(a) According to local law, each household in this area is prohibited from owning more than 3 of these pets.
If a household in this area is selected at random, what is the probability that the selected household will be
in violation of this law? Show your work.
(b) If 10 households in this area are selected at random, what is the probability that exactly 2 of them will be
in violation of this law? Show your work.
(c) The mean and standard deviation of X are 1.65 and 1.851 respectively. Suppose that 150 households in
this area are to be selected at random and X , the mean number of dogs and cats per household, is to be
computed. Describe the sampling distribution of X , including its shape, center, and spread.
Gloria Barrett, Virginia Advanced Study Strategies
edited by Daren Starnes
9
January, 2011
Solution, 2008 Form B Question 2
(a) Statistics A, C, and D appear to be unbiased. This is indicated by the fact that the mean of the estimated
sampling distribution for each of these statistics is about 75, the value of the population parameter.
Note: No other characteristic should be mentioned in the response. Students must clearly demonstrate an
understanding of the term unbiased.
(b) Statistic A would be a better choice because it appears to be unbiased (or centered at 75). Although the
variability of the two estimated sampling distributions is similar, statistic A would produce estimates that
tend to be closer to the true population parameter value of 75 than would statistic B.
(c) Statistic C would be a better choice because it has smaller variability. Although both statistic C and
statistic D appear to be unbiased, statistic C would produce estimates that tend to be closer to the true
population parameter value of 75 than would statistic D.
Solution, 1998 Question 1
(a) The mean of the sampling distribution is equal to the mean of the population.
Note: There were a number of papers that had responses containing “the sample mean is close to" or “gets
close to the population mean as n increases," or other rewordings of the law of large numbers. These
statements, while true, do not answer the question posed.
(b) The standard deviation of the sampling distribution is equal to the standard deviation of the population
divided by the square root of the sample size.
OR
Clearly states that the standard deviation of the sampling distribution decreases as n increases.
(c) The equivalent of the following two statements must be included:
1. The sampling distribution is skewed for small sample sizes. (A statement that does not use the term
skewed but says the distribution will be non-normal is OK.)
2. The shape of the sampling distribution gets more and more normal-like (bell shaped) as the sample size
increases.
NOTE: Standard symbols were acceptable without explanation, but on this and all free response questions,
non-standard symbols must be defined.
Gloria Barrett, Virginia Advanced Study Strategies
edited by Daren Starnes
10
January, 2011
Solution, 2004 Form B Question 3
Let X = weight of ore in a randomly selected car.
(a) P( X  70.7)  P( Z 
70.7  70
)  P( Z  0.78)  0.2177
0.9
(b) No. Approximately 22% of the cars will have ore weights of 70.7 or greater when the filling equipment is
working properly, so a car that was filled with 70.7 tons of ore would not be an unusual occurrence.
(c) P( X  70.7)  P( Z 
70.7  70
0.7
)  P( Z 
)  P(Z  2.46)  0.0069
0.9
0.285
10
(d) Yes, we would suspect that the filling mechanism is overfilling. If it is working properly, the probability
that the mean weight of the ore in 10 randomly selected cars is 70.7 or greater is 0.0069 which is very small.
Note 1: To receive complete credit for part (a) or part (c), students must show how the probability is
computed. Since part (a) and part (c) involve different normal distributions, it is important to identify which
normal distribution is used in each part. As shown above, this could be done by displaying a probability
statement containing the mean and standard deviation for the appropriate normal distribution. It could be
done in other ways, such as listing the mean and standard deviation and displaying an appropriate graph.
Note 2: The response in part (b) could be justified by indicating that 70.7 tons is less than one standard
deviation away from the desired mean of 70 tons. The response in part (d) could be justified by indicating
that 70.7 tons is more than two standard deviations above the desired mean of 70 tons.
Solution, 2007 Form B Question 2
(a) P( X  3)  0.07  0.04  0.04  0.02  0.17
(b) Y = number of households in violation. Y has a binomial distribution with n = 10 and p = 0.17.
10 
P(Y  2)    (0.17) 2 (0.83)8  0.2929
2
(c) The distribution of X will:
1. be approximately normal (note that the word approximately is required for an essentially correct
response) OR is more symmetric than the population distribution which is highly skewed.
2. have mean  X    1.65
3. have standard deviation  X 
Gloria Barrett, Virginia Advanced Study Strategies

n

1.851
 0.1511
150
edited by Daren Starnes
11
January, 2011
Solution, 2007 Question 3
(a) The random sample of n = 15 fish is more likely to have a sample mean length greater than 10 inches.
The sampling distribution of the sample mean x is normal with mean   8 and standard deviation 
n
.
Thus, both sampling distributions will be centered at 8 inches, but the sampling distribution of the sample
means when n = 15 will have more variability than the sampling distribution of sample means when n = 50.
The tail area  x  10 will be larger for the distribution that is less concentrated about the mean of 8 inches
when the sample size is n = 15, as shown in the following graph.
(b)
Gloria Barrett, Virginia Advanced Study Strategies
edited by Daren Starnes
12
January, 2011
(c) Yes. The Central Limit Theorem says that the sampling distribution of the sample mean will become
approximately normal as the sample size increases. Since the sample size is reasonably large (n = 50), the
calculation in part (b) will provide a good approximation to the probability of interest even though the
population is nonnormal.
Solution to proportion problem
(a) The sample proportion p̂ is an unbiased estimator of the population proportion p, and so the mean of
the sampling distribution of p̂ will be 0.45.
(b) As long as the “large” candy machine contains at least 250 candies, we should be safe using the formula
for the standard deviation of the sampling distribution of p̂ , even though we are sampling without
replacement from a finite population:
 pˆ 
p(1  p)
0.45(0.55)

 0.0995
n
25
(c) Both np = 25(0.45) = 11.25 and n(1 – p) = 25(0.55) = 13.75 are at least 10, so Normal approximation is
appropriate.
(d) A sample proportion of pˆ  0.36 corresponds to a z-score of
z
0.36  0.45
 0.90
0.0995
P  pˆ  0.36  P( z  0.90)  0.1841 .
(e) About 1 in every 5 times, you would get 9 or fewer orange candies
just by chance if the company’s claim is true. This isn’t strong enough evidence to convince you that the
company is putting too few orange candies in the machine.
Gloria Barrett, Virginia Advanced Study Strategies
edited by Daren Starnes
13
January, 2011
Download