Statistics Final Exam Review Problems

advertisement
Statistics Final Exam Review Problems
This is a selection of problems that have come from previous midterms and final exams, as well as some selected
review problems from the book. Some of these problems are repeated from the sample midterm problems.
However, most are previously-unseen problems, and so you should revisit the 68 problems that were in the
Midterm Review Document as part of your reviewing for the final exam process.. Your final exam will certainly
have different problems, but these can give you an idea of what to expect. There will also be many fewer questions
on your two-hour test!
The final exam is cumulative. About two thirds of the test will come from material since the midterm (Parts IV, V,
and VI in UNITS 3 and 4). You will be expected to use your calculator; no tables will be provided.
1. In a Marist Institute survey of 950 randomly selected Americans, 54% of the sample answered “yes” to
the question “Do you think there is intelligent life on other planets?” Use this information to calculate
a 95% confidence interval for the proportion of all Americans who believe there is intelligent life on
other planets. You must show me exactly what you’re doing.
2. Human babies can be examined in the womb using ultrasound. Animal studies have suggested that
ultrasound examinations can cause low birthweight. As ultrasound became more common, researchers
observed an association in humans as well – babies who were exposed more often to ultrasound in the
womb had lower birthweight on average than babies who were exposed less often.
(a) What are the explanatory and response variables? Is this association positive or negative?
(b) Suggest at least one lurking or confounding variable that could explain the association.
3. You are performing an experiment to test the effects of a new cold medicine.
(a) Briefly explain what “double-blind” means.
(b) Why would you want this experiment to be double-blind?
4. At a food processing facility, a machine is calibrated so that, on average, the weight of a 16-ounce can of
vegetable soup is 16.2 ounces. To test to see whether the machine is operating correctly or it needs recalibration,
a random sample of 50 cans was taken and their weights measured. The mean weight of these 50 cans was 16.3
ounces and the standard deviation was 0.25 ounces. At the 0.01 level of significance, what conclusion can be
reached?
5. Here is a probability model for blood types of Americans:
Blood type
A
Probability of
B
0.11
AB
0.04
O
0.45
(a) Fill in the missing probability in the table.
(b) Suppose four Americans are chosen at random. Find the probability that they all have type O blood.
(c) Suppose four Americans are chosen at random. Find the probability that at least 2 of them have type O
blood.
6. A news article reports that of the 411 players on National Basketball Association rosters in February, only 139
“made more than the league average salary” of $2.36 million. Is $2.36 million the mean or median salary for
NBA players? How do you know?
7. To profitably produce a planned upgrade of a software product you make, you must charge customers $100. To
decide whether your customers are willing to pay this much, you contact a random sample of forty customers
and find that eleven would pay $100 for the upgrade. Construct and interpret a 92% confidence interval for the
proportion of all customers who would be willing to buy the upgrade for $100. Be sure to state the margin of
error of your confidence interval.
8. Here are the scores on the Survey of Study Habits and Attitudes (SSHA) for 18 first-year college women:
154
103
109
126
137
126
115
137
152
165
140
165
154
129
178
200
101
148
(a) Find the mean of the scores.
(b) Make a histogram of the data. (You do not have to draw the histogram for me.) Describe the shape of the
distribution. Are there any outliers?
(c) Would the mean increase, decrease, or stay the same if we drop the score of 200 from the list? (You
shouldn’t have to compute anything here.)
(d) Find the five-number summary and carefully draw the boxplot for me here. Label your boxplot.
9. Consider a group of professional male athletes, half of whom are gymnasts and half of whom are basketball
players. Would you expect a distribution of their heights to be uniform, unimodal, or bimodal? Explain why.
10. Suppose you roll two fair dice (one red, one green). The table shows every possible outcome. Each of the 36
rolls is equally likely.
Red die →
Green die
↓
1
2
3
4
5
6
1
2
3
4
5
6
1, 1
1, 2
1, 3
1, 4
1, 5
1, 6
2, 1
2, 2
2, 3
2, 4
2, 5
2, 6
3, 1
3, 2
3, 3
3, 4
3, 5
3, 6
4, 1
4, 2
4, 3
4, 4
4, 5
4, 6
5, 1
5, 2
5, 3
5, 4
5, 5
5, 6
6, 1
6, 2
6, 3
6, 4
6, 5
6, 6
Let A be the event of rolling an 8 on the pair (that is, the sum of the two dice is 8)
Let B be the event of rolling a 5 on the red die.
Let C be the event of rolling a 1 on the red die.
(a) Find P(A).
(b) Find P(A | B).
(c) Are A and B independent events? Why or why not?
(d) Are B and C independent events?
(e) Find P(B or C). Show all your work.
11. Suppose you collected data for the variables distance traveled in a car and gallons of gasoline used over
several trips and graphed the data in a scatter plot. Would you expect the scatterplot to show a positive
relationship, a negative relationship, or no relationship at all? Explain your answer.
12. Flora and Airto took tests to get into honors language programs. Flora scored 83 on the German test and Airto
scored 83 on the French test. The German test has a mean of 75 and a standard deviation of 10 points; the
French test has a mean of 80 points and a standard deviation of 5 points. Who did better compared to the
others who took their test?
13. People who eat lots of fruits and vegetables have lower rates of colon cancer than those who eat little of these
foods. Since fruits and vegetables are high in vitamins, researchers wondered whether vitamin pills would also
help reduce the rates of colon cancer. The clinical trial studied this question with 864 people who were at risk
of colon cancer. The subjects were divided into four groups: daily vitamin A, daily vitamins C and E, all three
vitamins every day, and daily placebo. After four years, the researchers were surprised to find no significant
difference in the occurrence of colon cancer among the groups.
(a) What are the factors and levels in this experiment?
(b) The study was double-blind. What does that mean?
14. The time to complete a standardized exam follows a Normal model with a mean of 75 minutes and a standard
deviation of 10 minutes.
(a) What percentage of students will complete the exam in an hour (60 minutes) or less? Draw a picture
illustrating your answer.
(b) If I want 90% of the students to complete the test, how many minutes shall I allow them? Draw a picture
illustrating your answer.
15. According to a survey of 260 randomly selected voters who voted in-person at the last presidential election, the
average time they waited in line to vote at a polling station was 42 minutes with a standard deviation of 16
minutes.
(a) Construct a 90% confidence interval of the average time voters waited in line to vote at a polling station in
the last presidential election. Report your answers to the nearest thousandth (i.e. with four digits after the
decimal point). Be sure to check the required conditions for completing this inferential statistics process.
(b) What is the margin of error for the confidence interval you constructed in part (a)?
16. Draw an example of a distribution that is skewed to the right. Mark on your picture the locations of the mode,
median, and mean.
17. To test an herbal treatment for depression, 100 volunteers who suffered from mild depression were randomly
divided into two groups. Each person was given a month’s supply of tea bags. For one group, the tea
contained the herb mixed with a spice tea, and for the other group, the bags contained only the spice tea.
Participants were not told which type of tea they had. They were asked to drink one cup of the tea per day for
a month. At the end of the month, a psychologist evaluated them to determine if their mood had improved.
The psychologist did not know who had the tea with the herbal ingredient added.
(a) Was this an experiment or an observational study? Explain why you think so.
(b) Was there a control group? Was a placebo treatment used? Explain.
18. When 10 vehicles were observed at random for their speeds (in mph) on a freeway, the following data were
obtained:
66 58 62 60 67 63 65 59 67 65
The speeds of the cars on the freeway are approximately normally distributed.
(a) Construct a 90% confidence interval for the mean speed on this stretch of freeway. You must show me
exactly what you’re doing.
(b) Is this interpretation of your interval true or false: the speeds of 90% of the cars on this freeway fall within
the interval you constructed in part a. Explain your answer.
(c) The speed limit on this stretch of highway is 60 mph. Conduct a hypothesis test to investigate your belief
that drivers seem to be driving in excess of the legal speed limit. State both the null and alternative
hypotheses, find a P-value, and state your conclusion.
19. Runners are concerned about their form when racing. One measure of form is the stride rate, the number of
steps taken per second. As running speed increases, the stride rate should also increase. The stride rate for
different speeds was measured for a group of female runners. Here are the speeds (in feet per second) and the
stride rates.
Speed
Stride Rate
15.86
3.05
16.88
3.12
17.5
3.17
18.62
3.25
19.97
3.36
21.06
3.46
22.11
3.55
(a) Which of these is the explanatory variable and which is the response variable?
(b) Plot the data with speed on the x-axis and stride rate on the y-axis. (You do not have to show me your
plot.) Describe the form of the relationship between the variables. What is its basic shape? Is the
association positive or negative? Is it strong or weak?
(c) Find the equation of the least-squares regression line of stride rate on speed. Use it to predict the stride
rate for a speed of 18 feet per second.
20. Twenty men and twenty women with high blood pressure were subjects in an experiment to determine the
effectiveness of a new drug in lowering blood pressure. Ten of the twenty men and ten of the twenty women
were chosen at random to receive the new drug. The remaining ten men and ten women received a placebo.
The design of this experiment is: (Choose one)
(a) completely randomized with one factor -- the drug.
(b) completely randomized with one factor -- gender.
(c) randomized block, blocked on drug and gender.
(d) randomized block, blocked on drug.
(e) randomized block, blocked on gender.
21. A consumer organization estimates that 29% of new cars have a cosmetic defect, such as a scratch or a dent,
when they are delivered to car dealers. This same organization believes that 7% have a functional defect –
something that does not work properly – and that 2% of new cars have both kinds of problems.
(a) If you buy a new car, what is the probability that it has some kind of defect (assuming defects are not
caught by the car dealer)?
(b) If you notice a dent on a new car, what is the probability that it has a functional defect?
(c) Are the two kinds of defects independent?
22. A study of the effects of running on personality involved 231 male runners who each ran about 20 miles per
week. The runners were given the Cattell Sixteen Personality Factor Questionnaire, a 187-item multiplechoice test often used by psychologists. The report (New York Times) stated:
“The researchers found statistically significant personality differences between the runners and the 30-year
old male population as a whole.”
A headline on the article said:
Research has shown that running can alter one’s moods.
(a) Explain carefully, as if you were talking to someone who knows no statistics, what “statistically
significant” means.
(b) Explain carefully, as if you were talking to someone who knows no statistics, why the headline is
misleading.
23. An examination consists of multiple-choice questions, each having five possible answers. Linda estimates that
she has probability 0.75 of knowing the correct answer to any question that may be asked. If she doesn’t know
the answer, she will guess, with a conditional probability of 1/5 of being correct.
(a) What is the probability that she will give a correct answer, either because she knows it or because she
guesses correctly?
(b) Given that she gave the correct answer to a question, what is the probability that she guessed it?
24. A random sample of size 10 is taken from a population. The sample has standard deviation = 0. Mark each of
the following statements true or false. Briefly explain your choice.
(a) The population must also have standard deviation = 0.
(b) The sample mean is equal to the population mean.
(c) The ten data points in the sample are equal in numerical value.
25. Physical abuse of children by their parents is a serious health problem. The potential damage caused by
allowing a case of child abuse to go undetected is great, but the costs of falsely accusing a parent are also
high.
Suppose the experience of school officials indicates that a careful physical examination will detect 95 percent
of battered children. Also suppose that of children who haven’t been battered, 10 percent are actually thought
to have been battered. The best information suggests that 3 percent of school children in an average American
city are being abused by their parents.
Use the following notation:
A  actual Abuse
+  Diagnosis of Abuse
Ac  No actual Abuse
  Diagnosis of No Abuse
(a) Give a numerical value for each of the following:
P (  | A) 
P (  | A) 
P (  | Ac ) 
P (  | Ac ) 
(b) Suppose a child is examined and is diagnosed as having been abused. How likely is it that such a child
is, in fact, abused?
26. The table below shows amounts of fat and calories in six fast food hamburgers.
Fat (g)
Calories
31
580
34
590
35
570
39
640
39
680
43
660
(a) Assume a linear model is appropriate. How many calories would you expect in a fast food hamburger with
41 grams of fat?
(b) Is a linear model appropriate? Explain clearly in complete sentences with as much supporting evidence as
possible.
27. The drug AZT was the first effective treatment for AIDS. An important medical experiment demonstrated that
regular doses of AZT delay the onset of symptoms in people in whom HIV is present. The researchers who
carried out this experiment wanted to know:


Does taking either 500 mg of AZT or 1500 mg of AZT per day delay the development of AIDS?
Is there any difference between the effects of these two doses?
The subjects were 1200 volunteers already infected with HIV but with no symptoms of AIDS when the study
started.
Design an experiment that would have answered the questions raised in the above paragraph. Use diagrams to
summarize the design. Be sure to address all the elements of experimental design. What are some
confounding variables? How might you control for them?
28. A cereal company claims that the mean weight of the cereal in its packets is at least 14 ounces. Assuming
that a hypothesis test has been correctly conducted and that the conclusion is to reject the null hypothesis,
state the conclusion in nontechnical terms. Your answer should not contain any statistical jargon; instead,
it should contain common English words that clearly convey the specific conclusion that can be drawn
regarding the average (mean) weight of this company’s packets of cereal
29. The population of Cleansville is 60% men and 40% women. 75% of the men are employed and 80% of the
women are employed.
(a) Find the probability that a person chosen at random from this town is an employed woman (that is, find
P(woman and employed)).
(b) Find the probability that an employed person in this town is a woman (that is, find P(woman | employed)).
30. National data show that on average, college freshmen spend 7.5 hours a week going to parties. One
administrator does not believe that these figures apply at her college, which has nearly 3,000 freshmen. She
thinks her students don’t party that much. She takes a simple random sample of 100 freshmen, and interviews
them. On average, they spend 6.6 hours a week going to parties, and the standard deviation is 9 hours. What
can you say about the administrator’s belief?
31. 57% of Americans say they believe in ESP (extra sensory perception). Suppose we take a random sample of
175 Americans and ask them if they believe in ESP.
(a) Can we use the Binomial probability model to answer questions about the number of “yes” responses you
receive? Why or why not?
(b) What is the probability that 80 or more in our sample will respond “yes”? You must show me exactly what
you’re doing when you find this answer.
32. According to the article “What are the Odds of Dying” (published by the National Safety Council), for a
person born after 1950 in the U.S., the lifetime probability of dying by a transport accident (i.e. car,
motorcycle, truck, pedestrian, etc.) is 1/77. Suppose that we randomly survey the cause of death of 33 people
born in the U.S. after 1950. Find the probability that at most 2 of the 33 people died as a result of a transport
accident. Be sure to justify your use of any mathematical model (i.e. distribution) you employ. Answer the
question with a complete English sentence.
33. The Boxy-Box Company is considering implementing a quality control plan for monitoring the weights of the
insulated crates that it manufactures. Under Boxy-Box’s manufacturing process, the weights of the individual
crates are approximately normally distributed, with mean  and standard deviation . Their quality control
plan calls for rejecting a crate as defective if its weight falls more than two standard deviations above or below
.
(a) What proportion of crates will be rejected under their quality control plan? Illustrate with a picture.
(b) Suppose a simple random sample of 8 crates is chosen from the thousands of crates manufactured in one
week. It is reasonable to use the binomial model to model the number of rejected crates? Why is this OK?
What are n and p?
(c) Under their quality control plan, what is the probability that at least 2 of the crates in the sample would be
rejected?
34. A newspaper reports that the governor’s approval rating stands at 65%. The article adds that the poll is based
on a random sample of 972 adults and has a margin of error of 2.5%. What level of confidence did the
pollsters use?
35. The Department of Animal Regulations released information on pet ownership for the population
consisting of all households in a particular rural county. Let the random variable X be the number
of licensed dogs in a randomly selected household. The distribution for the random variable X is
given below:
Value of X
0
1
2
3
Probability
0.39
0.35
0.13
0.13
(a) What is the expected number of dogs in a household in this rural county?
(b) What is the probability that a randomly selected household in this rural county has two or more dogs?
(c) Four houses in this rural county are selected at random. Assuming that the numbers of dogs in these four
houses are independent, what is the probability that none of the selected houses has a dog?
(d) Four houses in this rural county are selected at random. Assuming that the numbers of dogs in these four
houses are independent, what is the probability that at least one of the selected houses has a dog?
(e) Four houses in this rural county are selected at random. Assuming that the numbers of dogs in these four
houses are independent, what is the probability that all of these selected houses has no dog?
36. This morning a news anchor on BNN proclaimed “We are 94% confident that 67% of all American high
school kids drink alcohol at least once a week.” In the context of this problem, what does the phrase 94%
confident mean? Your discussion should include a description of the event that has a 94% chance of
happening. Avoid using the words confidence, confident, and sure. Please limit your use of the word “it” so
that there’s no doubt about what you’re referring to.
37. Describe the consequences of a Type I error and a Type II error.
H 0 : The percentage of women who will develop breast cancer is at most 11%.
H A : The percentage of women who will develop breast cancer is more than 11%.
38. What is a sampling distribution? For what purpose did we use sampling distributions?
39. There are 20 first-class passengers and 120 coach passengers scheduled on a flight. In addition to the usual
security screening, 10% of each class of passengers will be subjected to a more complete search.
(a) Which sampling strategy is employed to randomly select those passengers who will be subjected to a
more complete search?
(b) Among the 20 first-class passengers on the flight, there were four businessmen from Tslovadia. Two of
these four businessmen were the two first-class passengers to be subjected to the more complete search.
They complained of profiling, but the airline claims that the selection was random.
Conduct a simulation to estimate the probability that both of the first-class passengers subjected to the
more complete search were from Tslovadia. Describe how you assign random numbers to conduct your
simulation (be specific). Then, perform your simulation ten times. Show the results of your simulation
and specify the outcome of each trial.
Use a complete English sentence to summarize the results of your ten trials and state your opinion, based
on your simulation, regarding whether the airline was profiling.
(c) Compute the theoretical probability that both of the two randomly selected first-class passengers subjected
to the more complete search were from Tslovadia.
40. Identify the type of sampling strategy (Simple Random Sampling, Stratified Sampling, Cluster Sampling,
Systematic Sampling, or Convenience Sampling).
(a) A biology class at the UW has 160 students. All 160 students attend the lectures together but are split into
4 groups of 40 for lab sections. The professor wants to conduct a survey about satisfied the students are
with the course so she decides to randomly sample 5 students from each lab section.
(b) Ronny wants to know how often, on average, residents of his town eat out. He surveys 45 people as they
leave the Italian restaurant one block from his house.
(c) A large medical professional organization with membership consisting of doctors, nurses, and other
medial employees want to know how its members felt about HMOs (health maintenance organizations).
They randomly selected 500 members from each of the lists of all doctors, all nurses, and all other
employees and surveyed those 1500 members..
(d) When a worker in a factory begins her shift, she uses a random number generator to choose the first car
part she inspects and then she checks every 100th car part thereafter as these parts move past her on an
assembly line.
(e) A large medical professional organization with membership consisting of doctors, nurses, and other
medial employees want to know how its members felt about HMOs (health maintenance organizations).
They randomly selected ten cities from all cities in which members lived, and then surveyed all members
in those cities.
41. Suppose that 70% of women purchasing in-home pregnancy tests are pregnant. Since these tests have a 98%
accuracy rating, they will detect 98% of pregnant women. Furthermore, of the women purchasing in-home
pregnancy tests who aren’t actually pregnant, the test says that 98% of them aren’t pregnant.
What is the probability that a woman whose test indicates that she is pregnant actually is?
Suppose the experience of school officials indicates that a careful physical examination will detect 95 percent
of battered children. Also suppose that of children who haven’t been battered, 10 percent are actually thought
to have been battered.
42. A lecture hall has 190 seats with folding arm tables, 32 of
which are designed for left-handed people. The average
size of classes that meet in this lecture hall is 176, and we
can assume that about 10% of students are left-handed.
What is the probability that a right-handed student in one
of these classes is forced to use a folding arm table designed
for left-handed people?
43. Men tend to have longer feet than women. So, if you find a really long footprint at the scene of a crime, then
in the absence of any other evidence, you would probably conclude that the criminal was a man. And,
conversely, if you find a really short footprint at the scene of a crime, then (again in the absence of any other
information), you would probably conclude that the criminal was a woman. But, where is the cutoff and how
likely is it that you’re making a mistake?
Suppose that men’s foot lengths are normally distributed with mean 25 centimeters and standard deviation
4 centimeters, and women’s foot lengths are normally distributed with mean 19 centimeters and standard
deviation 3 centimeters.
(a) Sketch these two normal curves on the same axis, and label both the curves and the axis.
(b) A reasonable starting point to deciding on a cutoff value is to split the difference: conclude a footprint
belongs to a man if it is longer than 22 centimeters (the midpoint of the means 19 and 25). Using this ...
(i) determine the probability that you will mistakenly identify a man’s footprint as having come from a
woman.
(ii) determine the probability that you will mistakenly identify a woman’s footprint as having come from
a man.
(c) Change the cutoff value (from 22 centimeters to some new value) so that the error probability in item (i) of
part (b) is reduced to 0.08.
(d) Determine the probability of mistakenly identifying a woman’s footprint as having come from a man
using the cutoff value found in part (c).
44. Suppose a 95% confidence interval is accurately computed for  resulting in the interval (36.2, 54.8).
Identify those statements that are definitely true. Write the number of each true statement. If none of the
statements are true, write NONE.
 95% of the time,  falls within the interval (36.2, 54.8).
 One can have 95% confidence that  is 40.5.
 95% of all possible values for  will fall within the interval (36.2, 54.8).
 95% of the time, p falls within the interval (36.2, 54.8).
 Using this method, 95% of all the possible samples will produce the interval (36.2, 54.8) for  .
 The standard error is 4.3.
  is 40.5.
 There is a 95% chance that  will fall within the interval (36.2, 54.8).
45. A paint manufacturer claims that the mean drying time for its new latex paint is two
hours. To test that claim, the drying times are obtained for twenty randomly selected
cans of paint. The following table displays the data, in minutes.
123
127
131
122
109
106
128
133
115
120
139
119
121
116
110
135
130
136
133
109
Does the data provide sufficient evidence, at the 1% significance level, to conclude that the mean drying time
is greater than the manufacturer's claim of 120 minutes?
Review problems from the textbook:
pp. 155 – 163
Exercises 3, 5, 6, 7, 15, 17, 24, 25, 33, 35, 39
pp. 278 – 286
Exercises 1, 2, 3, 4, 5, 11, 16, 19, 25, 29, 41
pp. 357 – 361
Exercises 1, 3, 5, 7, 11, 15, 26, 31, 35, 36, 37
pp. 452 – 456
Exercises 1, 2, 7, 9, 11, 13, 17ab, 20ac, 21, 22, 31, 35abc, 36, 37bcd, 39, 44
pp. 580 – 584
Exercises 1, 2, 3, 4, 5, 6, 7, 11, 15, 17, 18, 21, 23, 24, 27, 29
pp. 677 – 683
Exercises 3, 4, 9, 15a, 20, 21, 25, 30, 32, 34, 35, 36ab
Download