Name: ____________________________________________________________
School: ___________________________________________________________
Instructions:
1.
No electronic devices except an approved calculator are permitted, including cell phones and dictionaries. If you brought a cell phone or an electronic device other than an approved calculator, please turn it off and put it away.
2.
Calculators may not be shared.
For Scoring Use Only:
Question 1 2 3 4 5 6
Score
DO NOT BEGIN UNTIL INSTRUCTED TO DO SO.
AP ® STATISTICS
SECTION II
Part A
Questions 1-5
Spend about 65 minutes on this part of the exam.
Percent of Section II grade ----75
Directions : Show all your work. Indicate clearly the methods you use, because you will be scored on the correctness of your methods as well as on the accuracy and completeness of your results and explanations.
1.
At a recent country music awards show, the diversity of the gender and ages of the award recipients was observed. The ages of the first thirty males and the first thirty females to receive an award
(including group awards) was recorded. Boxplots of the ages by gender of these award recipients are presented in the figure below.
(a) Write a few sentences to compare the distributions of ages for male and female award recipients.
AP ® is a trademark registered by the College Board, which was not involved in the production of, and does not endorse, this exam.
(b) The male award winner who was 74 years old was identified as a potential outlier. If this award winner had been 54 instead of 74 years old, what effect would this decrease have on the following statistics? Justify your answers.
The interquartile range of male ages:
The standard deviation of male ages:
-2-
2.
An elementary school principal is interested in determining whether obesity is a common problem among the children who attend the school. Body Mass Index (BMI) is a measure of relative weight based on an individual's mass and height, and is considered a better measure of obesity than weight alone. High values can be used to indicate obesity. The BMI assessment takes some time to complete so it will not be feasible to assess every child. Some local university researchers are willing to help complete the BMI assessments for a sample of children in the school.
The school is organized in wings as depicted below. Each wing contains four classrooms from the same grade level. There are a total of 24 classrooms across grades K through 5. Each classroom contains 20 children.
1 2 3 4
5
The Kindergarten Wing
6 7
9
The 1st Grade Wing
10 11
13
The 2nd Grade Wing
14 15
17
The 3rd Grade Wing
18 19
21
The 4th Grade Wing
22 23
The 5th Grade Wing
8
12
16
20
24
-3-
(a) For convenience, the researchers want to use a cluster sampling method, in which the classrooms are clusters. Their goal is to assess 60 children. Describe a process for randomly selecting classrooms and identifying the sample of children using this method.
(b) One of the researchers suggests that an alternative sampling method would be to select a stratified random sample. Describe a process for randomly selecting 60 children using this sampling strategy and justify your selection of the stratification variable.
(c) In the context of this situation, give one statistical advantage of using a stratified random sampling method as opposed to a cluster sampling method that uses classrooms as clusters.
-4-
3.
A softball player wants to investigate the number of times she can hit the ball in successive attempts.
The player plans to swing at every pitch so that she will either hit the ball or miss it on each attempt.
Assume the player hits the ball 25% of the time and that her attempts are independent.
(a) What is the probability the player gets her third hit on the fourth attempt?
The player continued her inquiry for several weeks. Each day the player was given up to six attempts to get three hits. Let the random variable X represent the number of attempts required for the player to get three hits given that she is successful at hitting the ball three times in her six attempts. The table below gives the observed relative frequencies for each possible value of X . x 3 4 5 6
Observed Probability of x 0.09 0.21 0.31 0.39
(b) Use the given relative frequencies to find the mean and standard deviation of X .
-5-
(c) Two spectators who are watching the softball player decide to set up a bet between themselves.
Spectator A believes that the next time the player is successful at getting three hits in her six attempts that it will only take 3 or 4 attempts, and spectator B believes that it will take 5 or 6 attempts. The spectators have agreed that spectator A will receive $20 if she is correct. Based on the observed relative frequencies of X , how much should spectator B receive to ensure this is a fair bet? A fair bet is one in which both parties have the same expected winnings.
-6-
4.
Harpy eagles live in the rain forests of Central and South America and are one of the largest species of eagle. For many birds of prey, females are larger than males. A biologist is interested in investigating whether this phenomenon is also true of the harpy eagle. She selects random samples of 8 female harpy eagles and 9 male harpy eagles. The weights of the eagles, in pounds, were recorded, as shown in the table below.
Weight of Eagle Mean
Standard
Deviation
Females
( n
F
= 8)
Males
( n
M
= 9)
13.5 14.1 15.1 16.0 17.6 14.9 14.8 13.6
13.4 13.9 13.5 12.8 14.5 12.7 13.6 13.1 12.2
14.95
13.30
1.353
0.689
Do the data provide convincing evidence that the mean weight of female harpy eagles is greater than the mean weight of male harpy eagles?
-7-
If you need more room for your work in question 4, use the space below.
-8-
5.
The U.S. speed skaters heading to the 2018 Pyeongchang Winter Olympics needed to investigate reasons for the team's poor showing at the 2014 Sochi Winter Olympics. There were many potential reasons presented to explain this poor performance, including that the U.S. speed skating team trained in Collalbo, Italy at elevation 3,792 ft but Sochi, Russia is at sea level (elevation 0 ft).
(a) In the months leading up to the Sochi Olympics, the U.S. speed skaters competed in many international skating competitions to prepare for the Olympic Games. One competition was located in Salt Lake City, Utah (elevation 4,330 ft) in November 2013 and the other in Berlin,
Germany (elevation 164 ft) in December 2013. For these two competitions the times for 14 athletes in the men's 1500m were recorded in both races. The difference in finish times was calculated for each athlete and summary statistics are shown below.
Salt Lake City – Berlin
Mean
− 3.37 seconds
Standard Deviation
0.87 seconds
Assuming that all conditions for inference have been met, construct a 95% confidence interval for the mean difference in 1500m times between the Salt Lake City and Berlin competitions.
Does this interval suggest that there is a difference in mean finish times when two competitions are held at different elevations? Explain.
(b) The completion times for 15 athletes who competed in both the Berlin event and in the 1500m final at the Sochi Olympics (two cities with similar elevation) were also compared and the difference in finish times (Berlin – Sochi) was calculated for each athlete. From this data, a 95% confidence interval for the mean difference in 1500m times between the Berlin and Sochi competitions is 0.16 ± 0.68 seconds. Assuming that all conditions for inference have been met, does this interval suggest that there is a difference in mean finish times when two competitions are held at similar elevations? Explain.
-9-
A scatterplot showing the finishing times by country for the 10 athletes who competed in all three events are shown below. Shani Davis, the U.S. speed skater whose position in the plot is circled, won silver medals in the men's 1500m during both the 2006 and 2010 Olympics but finished 11 th in
Sochi. He is also set to compete in the 2018 Pyeongchang Winter Olympics.
Scatterplot: Change in Times for Athletes Who Completed All Three Events
(c) Use the scatterplot to compare Shani's finish times across all three competitions (Salt Lake at elevation 4,330 ft, Berlin at elevation 164 ft, and Sochi at elevation 0 ft) and comment on the difference in Shani's performance when competing in cities with different elevations compared to his performance when competing in cities with similar elevations.
(d) Using all the information in this question, what recommendation would you make to the U.S.
Speed Skating Association in regards to the elevations of the cities where the team trains for future Olympics?
-10-
AP ® STATISTICS
SECTION II
Part B
Question 6
Spend about 25 minutes on this part of the exam.
Percent of Section II grade ----25
Directions: Show all your work. Indicate clearly the methods you use, because you will be scored on the correctness of your methods as well as on the accuracy and completeness of your results and explanations.
6.
A researcher is trying to estimate the unknown proportion p of individuals with a rare genetic trait in a particular population. The researcher will take a sample of n individuals from this population and count the number of individuals, X , in the sample that possess this genetic trait.
Suppose that a sample consists of n = 20 trials from this binomial process with success probability p .
In other words, let the random variable X , the number who possess this genetic trait, have a binomial probability distribution with parameters n = 20 and unknown success probability p .
(a) Suppose the population proportion of individuals who possess this genetic trait is 0.03. A simulation was conducted in which 1,000 random samples of size n = 20 were taken from this population and the point estimate 𝑝𝑝̂ =
𝑋𝑋 𝑛𝑛
calculated. The histogram below displays the distribution of the 1,000 simulated sample statistics
, 𝑝𝑝̂
.
Summary Statistics
Mean of 𝑝𝑝̂
values: 0.030
Std Dev of 𝑝𝑝̂
values: 0.038
Simulated Values of 𝑝𝑝̂
Based on these simulation results, does it appear that 𝑝𝑝̂ is an unbiased estimator of the population proportion p of individuals who possess this genetic trait? Explain.
-11-
(b) For each of the 1,000 sample proportions obtained in part (a), a 95% confidence interval for the population proportion p of individuals who possess this genetic trait was constructed using the usual one-proportion z -interval formula. Of the 1,000 intervals, 469 or 46.9% succeeded in capturing the population proportion p of individuals who possess this genetic trait within the endpoints of the interval. Explain why the proportion of intervals in the simulation that succeeded in capturing the parameter p was much less than 95%.
Another estimator for p that can provide a statistical advantage over the conventional estimator 𝑝𝑝̂
in certain situations is the following: 𝑝𝑝� =
𝑋𝑋 + 2 𝑛𝑛 + 4.
(c) Carry out the calculations below to investigate the relationship between 𝑝𝑝̂ and 𝑝𝑝� .
(i) Suppose that the sample results in 5 individuals who possess this genetic trait among the 20 trials. Determine the values of 𝑝𝑝̂
and 𝑝𝑝�
.
(ii) Suppose now that the sample results in 12 successes among the 20 trials. Determine the values of 𝑝𝑝̂
and 𝑝𝑝�
.
(iii) Are there any sample results for which the values of 𝑝𝑝̂
and 𝑝𝑝�
would be the same? Justify your answer.
-12-
(d) A simulation was conducted in which 1,000 random samples of size n = 20 were taken from this population and the point estimate 𝑝𝑝� =
𝑋𝑋+2 𝑛𝑛+4
calculated. The histogram below displays the distribution of the 1,000 simulated sample statistics, 𝑝𝑝�
.
Summary Statistics
Mean of 𝑝𝑝�
values: 0.107
Std Dev of 𝑝𝑝�
values: 0.031
Simulated Values of 𝑝𝑝�
For each of the 1,000 sample proportions obtained, a 95% confidence interval for the population proportion p of individuals who possess this genetic trait was constructed using 𝑝𝑝�
in place of 𝑝𝑝̂
in the usual one-proportion z -interval formula and n + 4 in place of n . Of the 1,000 intervals, 976 or
97.6% succeeded in capturing the population proportion p of individuals who possess this genetic trait within the endpoints of the interval.
Based on these simulation results, does it appear that 𝑝𝑝�
is an unbiased estimator of the population proportion p of individuals who possess this genetic trait? Explain.
-13-
(e) Based on comparing the summary statistics for the simulation results in parts (a) and (d), state a statistical advantage of the estimator 𝑝𝑝̂
.
(f) Based on comparing the summary statistics for the simulation results in parts (a) and (d), state a statistical advantage of the estimator 𝑝𝑝�
. Explain why this statistical advantage makes sense, given that the new statistic 𝑝𝑝�
is calculated by adding 2 to the numerator and 4 to the denominator of p .
END OF EXAMINATION
-14-