Statistics Final Exam Review Problems This is a selection of problems that have come from previous midterms and final exams, as well as some selected review problems from the book. Some of these problems are repeated from the sample midterm problems. However, most are previously-unseen problems, and so you should revisit the 68 problems that were in the Midterm Review Document as part of your reviewing for the final exam process.. Your final exam will certainly have different problems, but these can give you an idea of what to expect. There will also be many fewer questions on your two-hour test! The final exam is cumulative. About two thirds of the test will come from material since the midterm (Parts IV, V, and VI in UNITS 3 and 4). You will be expected to use your calculator; no tables will be provided. 1. In a Marist Institute survey of 950 randomly selected Americans, 54% of the sample answered “yes” to the question “Do you think there is intelligent life on other planets?” Use this information to calculate a 95% confidence interval for the proportion of all Americans who believe there is intelligent life on other planets. You must show me exactly what you’re doing. 2. Human babies can be examined in the womb using ultrasound. Animal studies have suggested that ultrasound examinations can cause low birthweight. As ultrasound became more common, researchers observed an association in humans as well – babies who were exposed more often to ultrasound in the womb had lower birthweight on average than babies who were exposed less often. (a) What are the explanatory and response variables? Is this association positive or negative? (b) Suggest at least one lurking or confounding variable that could explain the association. 3. You are performing an experiment to test the effects of a new cold medicine. (a) Briefly explain what “double-blind” means. (b) Why would you want this experiment to be double-blind? 4. At a food processing facility, a machine is calibrated so that, on average, the weight of a 16-ounce can of vegetable soup is 16.2 ounces. To test to see whether the machine is operating correctly or it needs recalibration, a random sample of 50 cans was taken and their weights measured. The mean weight of these 50 cans was 16.3 ounces and the standard deviation was 0.25 ounces. At the 0.01 level of significance, what conclusion can be reached? 5. Here is a probability model for blood types of Americans: Blood type A Probability of B 0.11 AB 0.04 O 0.45 (a) Fill in the missing probability in the table. (b) Suppose four Americans are chosen at random. Find the probability that they all have type O blood. (c) Suppose four Americans are chosen at random. Find the probability that at least 2 of them have type O blood. 6. A news article reports that of the 411 players on National Basketball Association rosters in February, only 139 “made more than the league average salary” of $2.36 million. Is $2.36 million the mean or median salary for NBA players? How do you know? 7. To profitably produce a planned upgrade of a software product you make, you must charge customers $100. To decide whether your customers are willing to pay this much, you contact a random sample of forty customers and find that eleven would pay $100 for the upgrade. Construct and interpret a 92% confidence interval for the proportion of all customers who would be willing to buy the upgrade for $100. Be sure to state the margin of error of your confidence interval. 8. Here are the scores on the Survey of Study Habits and Attitudes (SSHA) for 18 first-year college women: 154 103 109 126 137 126 115 137 152 165 140 165 154 129 178 200 101 148 (a) Find the mean of the scores. (b) Make a histogram of the data. (You do not have to draw the histogram for me.) Describe the shape of the distribution. Are there any outliers? (c) Would the mean increase, decrease, or stay the same if we drop the score of 200 from the list? (You shouldn’t have to compute anything here.) (d) Find the five-number summary and carefully draw the boxplot for me here. Label your boxplot. 9. Consider a group of professional male athletes, half of whom are gymnasts and half of whom are basketball players. Would you expect a distribution of their heights to be uniform, unimodal, or bimodal? Explain why. 10. Suppose you roll two fair dice (one red, one green). The table shows every possible outcome. Each of the 36 rolls is equally likely. Red die → Green die ↓ 1 2 3 4 5 6 1 2 3 4 5 6 1, 1 1, 2 1, 3 1, 4 1, 5 1, 6 2, 1 2, 2 2, 3 2, 4 2, 5 2, 6 3, 1 3, 2 3, 3 3, 4 3, 5 3, 6 4, 1 4, 2 4, 3 4, 4 4, 5 4, 6 5, 1 5, 2 5, 3 5, 4 5, 5 5, 6 6, 1 6, 2 6, 3 6, 4 6, 5 6, 6 Let A be the event of rolling an 8 on the pair (that is, the sum of the two dice is 8) Let B be the event of rolling a 5 on the red die. Let C be the event of rolling a 1 on the red die. (a) Find P(A). (b) Find P(A | B). (c) Are A and B independent events? Why or why not? (d) Are B and C independent events? (e) Find P(B or C). Show all your work. 11. Suppose you collected data for the variables distance traveled in a car and gallons of gasoline used over several trips and graphed the data in a scatter plot. Would you expect the scatterplot to show a positive relationship, a negative relationship, or no relationship at all? Explain your answer. 12. Flora and Airto took tests to get into honors language programs. Flora scored 83 on the German test and Airto scored 83 on the French test. The German test has a mean of 75 and a standard deviation of 10 points; the French test has a mean of 80 points and a standard deviation of 5 points. Who did better compared to the others who took their test? 13. People who eat lots of fruits and vegetables have lower rates of colon cancer than those who eat little of these foods. Since fruits and vegetables are high in vitamins, researchers wondered whether vitamin pills would also help reduce the rates of colon cancer. The clinical trial studied this question with 864 people who were at risk of colon cancer. The subjects were divided into four groups: daily vitamin A, daily vitamins C and E, all three vitamins every day, and daily placebo. After four years, the researchers were surprised to find no significant difference in the occurrence of colon cancer among the groups. (a) What are the factors and levels in this experiment? (b) The study was double-blind. What does that mean? 14. The time to complete a standardized exam follows a Normal model with a mean of 75 minutes and a standard deviation of 10 minutes. (a) What percentage of students will complete the exam in an hour (60 minutes) or less? Draw a picture illustrating your answer. (b) If I want 90% of the students to complete the test, how many minutes shall I allow them? Draw a picture illustrating your answer. 15. According to a survey of 260 randomly selected voters who voted in-person at the last presidential election, the average time they waited in line to vote at a polling station was 42 minutes with a standard deviation of 16 minutes. (a) Construct a 90% confidence interval of the average time voters waited in line to vote at a polling station in the last presidential election. Report your answers to the nearest thousandth (i.e. with four digits after the decimal point). Be sure to check the required conditions for completing this inferential statistics process. (b) What is the margin of error for the confidence interval you constructed in part (a)? 16. Draw an example of a distribution that is skewed to the right. Mark on your picture the locations of the mode, median, and mean. 17. To test an herbal treatment for depression, 100 volunteers who suffered from mild depression were randomly divided into two groups. Each person was given a month’s supply of tea bags. For one group, the tea contained the herb mixed with a spice tea, and for the other group, the bags contained only the spice tea. Participants were not told which type of tea they had. They were asked to drink one cup of the tea per day for a month. At the end of the month, a psychologist evaluated them to determine if their mood had improved. The psychologist did not know who had the tea with the herbal ingredient added. (a) Was this an experiment or an observational study? Explain why you think so. (b) Was there a control group? Was a placebo treatment used? Explain. 18. When 10 vehicles were observed at random for their speeds (in mph) on a freeway, the following data were obtained: 66 58 62 60 67 63 65 59 67 65 The speeds of the cars on the freeway are approximately normally distributed. (a) Construct a 90% confidence interval for the mean speed on this stretch of freeway. You must show me exactly what you’re doing. (b) Is this interpretation of your interval true or false: the speeds of 90% of the cars on this freeway fall within the interval you constructed in part a. Explain your answer. (c) The speed limit on this stretch of highway is 60 mph. Conduct a hypothesis test to investigate your belief that drivers seem to be driving in excess of the legal speed limit. State both the null and alternative hypotheses, find a P-value, and state your conclusion. 19. Runners are concerned about their form when racing. One measure of form is the stride rate, the number of steps taken per second. As running speed increases, the stride rate should also increase. The stride rate for different speeds was measured for a group of female runners. Here are the speeds (in feet per second) and the stride rates. Speed Stride Rate 15.86 3.05 16.88 3.12 17.5 3.17 18.62 3.25 19.97 3.36 21.06 3.46 22.11 3.55 (a) Which of these is the explanatory variable and which is the response variable? (b) Plot the data with speed on the x-axis and stride rate on the y-axis. (You do not have to show me your plot.) Describe the form of the relationship between the variables. What is its basic shape? Is the association positive or negative? Is it strong or weak? (c) Find the equation of the least-squares regression line of stride rate on speed. Use it to predict the stride rate for a speed of 18 feet per second. 20. Twenty men and twenty women with high blood pressure were subjects in an experiment to determine the effectiveness of a new drug in lowering blood pressure. Ten of the twenty men and ten of the twenty women were chosen at random to receive the new drug. The remaining ten men and ten women received a placebo. The design of this experiment is: (Choose one) (a) completely randomized with one factor -- the drug. (b) completely randomized with one factor -- gender. (c) randomized block, blocked on drug and gender. (d) randomized block, blocked on drug. (e) randomized block, blocked on gender. 21. A consumer organization estimates that 29% of new cars have a cosmetic defect, such as a scratch or a dent, when they are delivered to car dealers. This same organization believes that 7% have a functional defect – something that does not work properly – and that 2% of new cars have both kinds of problems. (a) If you buy a new car, what is the probability that it has some kind of defect (assuming defects are not caught by the car dealer)? (b) If you notice a dent on a new car, what is the probability that it has a functional defect? (c) Are the two kinds of defects independent? 22. A study of the effects of running on personality involved 231 male runners who each ran about 20 miles per week. The runners were given the Cattell Sixteen Personality Factor Questionnaire, a 187-item multiplechoice test often used by psychologists. The report (New York Times) stated: “The researchers found statistically significant personality differences between the runners and the 30-year old male population as a whole.” A headline on the article said: Research has shown that running can alter one’s moods. (a) Explain carefully, as if you were talking to someone who knows no statistics, what “statistically significant” means. (b) Explain carefully, as if you were talking to someone who knows no statistics, why the headline is misleading. 23. An examination consists of multiple-choice questions, each having five possible answers. Linda estimates that she has probability 0.75 of knowing the correct answer to any question that may be asked. If she doesn’t know the answer, she will guess, with a conditional probability of 1/5 of being correct. (a) What is the probability that she will give a correct answer, either because she knows it or because she guesses correctly? (b) Given that she gave the correct answer to a question, what is the probability that she guessed it? 24. A random sample of size 10 is taken from a population. The sample has standard deviation = 0. Mark each of the following statements true or false. Briefly explain your choice. (a) The population must also have standard deviation = 0. (b) The sample mean is equal to the population mean. (c) The ten data points in the sample are equal in numerical value. 25. Physical abuse of children by their parents is a serious health problem. The potential damage caused by allowing a case of child abuse to go undetected is great, but the costs of falsely accusing a parent are also high. Suppose the experience of school officials indicates that a careful physical examination will detect 95 percent of battered children. Also suppose that of children who haven’t been battered, 10 percent are actually thought to have been battered. The best information suggests that 3 percent of school children in an average American city are being abused by their parents. Use the following notation: A actual Abuse + Diagnosis of Abuse Ac No actual Abuse Diagnosis of No Abuse (a) Give a numerical value for each of the following: P ( | A) P ( | A) P ( | Ac ) P ( | Ac ) (b) Suppose a child is examined and is diagnosed as having been abused. How likely is it that such a child is, in fact, abused? 26. The table below shows amounts of fat and calories in six fast food hamburgers. Fat (g) Calories 31 580 34 590 35 570 39 640 39 680 43 660 (a) Assume a linear model is appropriate. How many calories would you expect in a fast food hamburger with 41 grams of fat? (b) Is a linear model appropriate? Explain clearly in complete sentences with as much supporting evidence as possible. 27. The drug AZT was the first effective treatment for AIDS. An important medical experiment demonstrated that regular doses of AZT delay the onset of symptoms in people in whom HIV is present. The researchers who carried out this experiment wanted to know: Does taking either 500 mg of AZT or 1500 mg of AZT per day delay the development of AIDS? Is there any difference between the effects of these two doses? The subjects were 1200 volunteers already infected with HIV but with no symptoms of AIDS when the study started. Design an experiment that would have answered the questions raised in the above paragraph. Use diagrams to summarize the design. Be sure to address all the elements of experimental design. What are some confounding variables? How might you control for them? 28. A cereal company claims that the mean weight of the cereal in its packets is at least 14 ounces. Assuming that a hypothesis test has been correctly conducted and that the conclusion is to reject the null hypothesis, state the conclusion in nontechnical terms. Your answer should not contain any statistical jargon; instead, it should contain common English words that clearly convey the specific conclusion that can be drawn regarding the average (mean) weight of this company’s packets of cereal 29. The population of Cleansville is 60% men and 40% women. 75% of the men are employed and 80% of the women are employed. (a) Find the probability that a person chosen at random from this town is an employed woman (that is, find P(woman and employed)). (b) Find the probability that an employed person in this town is a woman (that is, find P(woman | employed)). 30. National data show that on average, college freshmen spend 7.5 hours a week going to parties. One administrator does not believe that these figures apply at her college, which has nearly 3,000 freshmen. She thinks her students don’t party that much. She takes a simple random sample of 100 freshmen, and interviews them. On average, they spend 6.6 hours a week going to parties, and the standard deviation is 9 hours. What can you say about the administrator’s belief? 31. 57% of Americans say they believe in ESP (extra sensory perception). Suppose we take a random sample of 175 Americans and ask them if they believe in ESP. (a) Can we use the Binomial probability model to answer questions about the number of “yes” responses you receive? Why or why not? (b) What is the probability that 80 or more in our sample will respond “yes”? You must show me exactly what you’re doing when you find this answer. 32. According to the article “What are the Odds of Dying” (published by the National Safety Council), for a person born after 1950 in the U.S., the lifetime probability of dying by a transport accident (i.e. car, motorcycle, truck, pedestrian, etc.) is 1/77. Suppose that we randomly survey the cause of death of 33 people born in the U.S. after 1950. Find the probability that at most 2 of the 33 people died as a result of a transport accident. Be sure to justify your use of any mathematical model (i.e. distribution) you employ. Answer the question with a complete English sentence. 33. The Boxy-Box Company is considering implementing a quality control plan for monitoring the weights of the insulated crates that it manufactures. Under Boxy-Box’s manufacturing process, the weights of the individual crates are approximately normally distributed, with mean and standard deviation . Their quality control plan calls for rejecting a crate as defective if its weight falls more than two standard deviations above or below . (a) What proportion of crates will be rejected under their quality control plan? Illustrate with a picture. (b) Suppose a simple random sample of 8 crates is chosen from the thousands of crates manufactured in one week. It is reasonable to use the binomial model to model the number of rejected crates? Why is this OK? What are n and p? (c) Under their quality control plan, what is the probability that at least 2 of the crates in the sample would be rejected? 34. A newspaper reports that the governor’s approval rating stands at 65%. The article adds that the poll is based on a random sample of 972 adults and has a margin of error of 2.5%. What level of confidence did the pollsters use? 35. The Department of Animal Regulations released information on pet ownership for the population consisting of all households in a particular rural county. Let the random variable X be the number of licensed dogs in a randomly selected household. The distribution for the random variable X is given below: Value of X 0 1 2 3 Probability 0.39 0.35 0.13 0.13 (a) What is the expected number of dogs in a household in this rural county? (b) What is the probability that a randomly selected household in this rural county has two or more dogs? (c) Four houses in this rural county are selected at random. Assuming that the numbers of dogs in these four houses are independent, what is the probability that none of the selected houses has a dog? (d) Four houses in this rural county are selected at random. Assuming that the numbers of dogs in these four houses are independent, what is the probability that at least one of the selected houses has a dog? (e) Four houses in this rural county are selected at random. Assuming that the numbers of dogs in these four houses are independent, what is the probability that all of these selected houses has no dog? 36. This morning a news anchor on BNN proclaimed “We are 94% confident that 67% of all American high school kids drink alcohol at least once a week.” In the context of this problem, what does the phrase 94% confident mean? Your discussion should include a description of the event that has a 94% chance of happening. Avoid using the words confidence, confident, and sure. Please limit your use of the word “it” so that there’s no doubt about what you’re referring to. 37. Describe the consequences of a Type I error and a Type II error. H 0 : The percentage of women who will develop breast cancer is at most 11%. H A : The percentage of women who will develop breast cancer is more than 11%. 38. What is a sampling distribution? For what purpose did we use sampling distributions? 39. There are 20 first-class passengers and 120 coach passengers scheduled on a flight. In addition to the usual security screening, 10% of each class of passengers will be subjected to a more complete search. (a) Which sampling strategy is employed to randomly select those passengers who will be subjected to a more complete search? (b) Among the 20 first-class passengers on the flight, there were four businessmen from Tslovadia. Two of these four businessmen were the two first-class passengers to be subjected to the more complete search. They complained of profiling, but the airline claims that the selection was random. Conduct a simulation to estimate the probability that both of the first-class passengers subjected to the more complete search were from Tslovadia. Describe how you assign random numbers to conduct your simulation (be specific). Then, perform your simulation ten times. Show the results of your simulation and specify the outcome of each trial. Use a complete English sentence to summarize the results of your ten trials and state your opinion, based on your simulation, regarding whether the airline was profiling. (c) Compute the theoretical probability that both of the two randomly selected first-class passengers subjected to the more complete search were from Tslovadia. 40. Identify the type of sampling strategy (Simple Random Sampling, Stratified Sampling, Cluster Sampling, Systematic Sampling, or Convenience Sampling). (a) A biology class at the UW has 160 students. All 160 students attend the lectures together but are split into 4 groups of 40 for lab sections. The professor wants to conduct a survey about satisfied the students are with the course so she decides to randomly sample 5 students from each lab section. (b) Ronny wants to know how often, on average, residents of his town eat out. He surveys 45 people as they leave the Italian restaurant one block from his house. (c) A large medical professional organization with membership consisting of doctors, nurses, and other medial employees want to know how its members felt about HMOs (health maintenance organizations). They randomly selected 500 members from each of the lists of all doctors, all nurses, and all other employees and surveyed those 1500 members.. (d) When a worker in a factory begins her shift, she uses a random number generator to choose the first car part she inspects and then she checks every 100th car part thereafter as these parts move past her on an assembly line. (e) A large medical professional organization with membership consisting of doctors, nurses, and other medial employees want to know how its members felt about HMOs (health maintenance organizations). They randomly selected ten cities from all cities in which members lived, and then surveyed all members in those cities. 41. Suppose that 70% of women purchasing in-home pregnancy tests are pregnant. Since these tests have a 98% accuracy rating, they will detect 98% of pregnant women. Furthermore, of the women purchasing in-home pregnancy tests who aren’t actually pregnant, the test says that 98% of them aren’t pregnant. What is the probability that a woman whose test indicates that she is pregnant actually is? Suppose the experience of school officials indicates that a careful physical examination will detect 95 percent of battered children. Also suppose that of children who haven’t been battered, 10 percent are actually thought to have been battered. 42. A lecture hall has 190 seats with folding arm tables, 32 of which are designed for left-handed people. The average size of classes that meet in this lecture hall is 176, and we can assume that about 10% of students are left-handed. What is the probability that a right-handed student in one of these classes is forced to use a folding arm table designed for left-handed people? 43. Men tend to have longer feet than women. So, if you find a really long footprint at the scene of a crime, then in the absence of any other evidence, you would probably conclude that the criminal was a man. And, conversely, if you find a really short footprint at the scene of a crime, then (again in the absence of any other information), you would probably conclude that the criminal was a woman. But, where is the cutoff and how likely is it that you’re making a mistake? Suppose that men’s foot lengths are normally distributed with mean 25 centimeters and standard deviation 4 centimeters, and women’s foot lengths are normally distributed with mean 19 centimeters and standard deviation 3 centimeters. (a) Sketch these two normal curves on the same axis, and label both the curves and the axis. (b) A reasonable starting point to deciding on a cutoff value is to split the difference: conclude a footprint belongs to a man if it is longer than 22 centimeters (the midpoint of the means 19 and 25). Using this ... (i) determine the probability that you will mistakenly identify a man’s footprint as having come from a woman. (ii) determine the probability that you will mistakenly identify a woman’s footprint as having come from a man. (c) Change the cutoff value (from 22 centimeters to some new value) so that the error probability in item (i) of part (b) is reduced to 0.08. (d) Determine the probability of mistakenly identifying a woman’s footprint as having come from a man using the cutoff value found in part (c). 44. Suppose a 95% confidence interval is accurately computed for resulting in the interval (36.2, 54.8). Identify those statements that are definitely true. Write the number of each true statement. If none of the statements are true, write NONE. 95% of the time, falls within the interval (36.2, 54.8). One can have 95% confidence that is 40.5. 95% of all possible values for will fall within the interval (36.2, 54.8). 95% of the time, p falls within the interval (36.2, 54.8). Using this method, 95% of all the possible samples will produce the interval (36.2, 54.8) for . The standard error is 4.3. is 40.5. There is a 95% chance that will fall within the interval (36.2, 54.8). 45. A paint manufacturer claims that the mean drying time for its new latex paint is two hours. To test that claim, the drying times are obtained for twenty randomly selected cans of paint. The following table displays the data, in minutes. 123 127 131 122 109 106 128 133 115 120 139 119 121 116 110 135 130 136 133 109 Does the data provide sufficient evidence, at the 1% significance level, to conclude that the mean drying time is greater than the manufacturer's claim of 120 minutes? Review problems from the textbook: pp. 155 – 163 Exercises 3, 5, 6, 7, 15, 17, 24, 25, 33, 35, 39 pp. 278 – 286 Exercises 1, 2, 3, 4, 5, 11, 16, 19, 25, 29, 41 pp. 357 – 361 Exercises 1, 3, 5, 7, 11, 15, 26, 31, 35, 36, 37 pp. 452 – 456 Exercises 1, 2, 7, 9, 11, 13, 17ab, 20ac, 21, 22, 31, 35abc, 36, 37bcd, 39, 44 pp. 580 – 584 Exercises 1, 2, 3, 4, 5, 6, 7, 11, 15, 17, 18, 21, 23, 24, 27, 29 pp. 677 – 683 Exercises 3, 4, 9, 15a, 20, 21, 25, 30, 32, 34, 35, 36ab