Fall Final Exam Review AP Statistics Name:___________________________ I. Vocabulary. You should be able the define and illustrate the following: Population Sample Convenience sample Voluntary response sample Random sampling Simple random sample (SRS) Stratified random sample Cluster sample Undercoverage Nonresponse Response bias Wording of question Observational study Experiments Confounded Explanatory Response Treatments Experimental units Factors Completely randomized design Placebo Double-blind Randomized block design Matched-pairs design Frequency Relative Frequency Pie Chart Bar Graph Two-way Table Quantitative Categorical Marginal Distribution Conditional Distribution Dotplot Stemplot Histogram Shape Outliers Center Spread Symmetric Skewed Center Spread Mean Median First Quartile Third Quartile IQR Standard Deviation Five-number summary Boxplot Variance Resistant Z-scores Linear Transformation Density curve Normal Distribution 68-95-99.7 Rule Standard Normal Distribution Scatterplot Form Direction Strength Correlation Regression Line Slope Y-intercept Extrapolation Least-Squares Regression Line Residuals Residual plot Coefficient of determination Influential observations Law of large numbers Probability Simulation Probability model Sample space Event Complement Mutually exclusive Two-way table Venn diagram General addition rule Conditional probability General multiplication rule Tree diagram Independent Random variable Probability distribution Discrete random variable Continuous random variable II. Multiple Choice. Select the letter that corresponds with the best answer. 1. A farmer wishes to study the effect of three different fertilizers on crop yields. He takes a rectangular field and divides it into four plots of equal area. Then he randomly assigns the three different fertilizers to one of the four plots. One plot receives no fertilizer. The plots are harvested after a following statements best describes the design of the study? I. This design has matched pairs. II. This design has blocks. III. This is a completely randomized design. (a) I only (b) I and III only (c) I and II only (d) II only (e) III only 2. The purpose of doing an experiment is to: (a) Determine cause and effect (b) Identify confounding variables (c) Identify lurking variables (d) Control participants (e) Find human error 3. At a charter high school, administrators wish to collect a sample of 50 students. The proportions of the student body represented by each class are: 45% freshmen, 28% sophomores, 16% juniors, and 11% seniors. They decide to randomly sample 23 freshmen, 14 sophomores, eight juniors, and five seniors. Which of the following methods was used to select the probability sample? (a) Random (b) Systematic (c) Stratified (d) Cluster (e) Simple Random 4. A charter school operator in Los Angeles wishes to gather information about student achievement. From the 73 small schools the operator manages, one school is selected by lottery and all students from that school are used in the sample. Which of the following methods was used to select the probability sample? (a) Random (b) Systematic (c) Stratified (d) Cluster (e) Simple Random 5. When a set of data has suspect outliers, which of the following are preferred measures of central tendency and of variability? (a) (b) (c) (d) (e) Mean and standard deviation Mean and variance Mean and range Median and range Median and interquartile range 6. The scores on a statistics test had a mean of 81 and a standard deviation of 9. One student was absent on the test day, and his score wasn’t included in the calculation. If his score of 84 was added to the distribution of scores, what would happen to the mean and standard deviation? (a) (b) (c) (d) (e) Mean will increase, and standard deviation will increase. Men will increase, and standard deviation will decrease. Mean will increase, and standard deviation will remain the same. Mean will decrease, and standard deviation will increase. Mean will decrease, and standard deviation will decrease. 7. Forty students took a statistics examination having a maximum of 50 points. The score distribution is given in the following stem-and-leaf plot: 0 28 1 2245 2 01333358889 3 001356679 4 22444466788 5 000 The third quartile of the score distribution is equal to: (a) 45 (b) 44 (c) 43 (d) 32 (e) 23 8. The interquartile range represents ______________________ of a data set. (a) (b) (c) (d) (e) The bottom 25% The difference between the maximum and minimum value The middle half The top half The bottom half 9. Suppose the scores on an exam have a mean of 75 with a standard deviation of 8. If one student has a test result with a zscore of -1.5, and a second student has a test result with a z-score of 2.0, how many points higher was the second student’s result than that of the first? (a) 3.5 (b) 4 (c) 12 (d) 16 (e) 28 10. Assuming that heights of professional male tennis players follow a bell-shaped distribution, arrange in ascending order: I. A height with a z-score of 1 II. A height with a percentile rank of 80 percent III. A height at the third quartile Q3 (a) I, II, III (b) I, III, II (c) II, I, III (d) III, I, II (e) III, II, I Questions 11 & 12 refer to the following setting. The weights of laboratory cockroaches follow a Normal distribution with mean 80 grams and standard deviation 2 grams. The following figure is the Normal curve for this distribution of weights. 11. Point C on this Normal curve corresponds to (a) (b) (c) (d) (e) 78 grams 74 grams 74 grams 82 grams 76 grams 12. About what percent of cockroaches have weights between 76 and 84 grams? (a) 99.7% (b) 68% (c) 95% (d) 34% (e) 47.5% 13. Data on ages (in years) and prices (in $100) for ten cars of a specific model result in the regression line: 𝑝𝑟𝑖𝑐𝑒 ̂ = 250 − 30(𝐴𝑔𝑒). Given that 64% of the variation in price is explainable by variation in age, what is the value of the correlation coefficient r? (a) (b) (c) (d) (e) −.64 −.80 . 64 . 80 There is insufficient information to answer this question. 14. Data are obtained from a random sample of women with regard to their ages and their monthly expenditures ̂ on health products. The resulting regression equation is: 𝐸𝑥𝑝𝑒𝑛𝑑𝑖𝑡𝑢𝑟𝑒 = 43 + (𝐴𝑔𝑒) with r = .27. What percentage of the variation in expenditures can be explained by looking at ages? (a) 0.23% (b) 23% (c) 7.29% (d) 27% (e) 52.0% 15. There is a linear relationship between the number of chirps made by the striped ground cricket and the air temperature. A least squares fit of some data collected by a biologist gives the model 𝑦̂ = 25.2 + 3.3𝑥, where x is the number of chirps per minute and 𝑦̂ is the estimated temperature in degrees Fahrenheit. What is the predicted temperature for a cricket who chirps 15 chirps per minute? (a) 3.3°F (b) 74.7°F (c) 25.2°F (d) 49.5°F (e) 41.7°F 16. Using the LSRL from #15, find and interpret the residual in context for the striped ground cricket who chirped 18 chirps per minute while it was 80°F. (a) Residual = 4.6°F. The temperature was 4.6°F higher than what was expected for a striped ground cricket who chirped 18 chirps per minute. (b) Residual = - 4.6 chirps. The striped ground cricket chirped 4.6 chirps less than what we expected at the temperature of 80°F. (c) Residual = 4.6 chirps. The striped ground cricket chirped 4.6 chirps more than what we expected at the temperature of 80°F. (d) Residual = - 4.6°F. The temperature was 4.6°F lower than what was expected for a striped ground cricket who chirped 18 chirps per minute. (e) Residual = 20.6°F. The temperature was 20.6°F higher than what we expected for a striped ground cricket who chirped 18 chirps per minute. 17. Suppose P(X) = 0.25 and P(Y) = 0.40. If P (X |Y) = 0.20, what is P(Y | X)? (a) 0.10 (b) 0.125 (c) 0.32 (d) 0.45 (e) 0.50 18. If P(A) = 0.25 and P(B) = 0.34, what is P(A ∪ B) if A and B are independent? (a) 0.085 (b) 0.505 (c) 0.590 (d) 0.675 (e) Insufficient information 19. Suppose you toss a fair coin ten times and it comes up heads every time. Which of the following is a true statement? (a) By the Law of Large Numbers, the next toss is more likely to be tails than another heads. (b) By the properties of conditional probability, the next toss is more likely to be heads given that ten tosses in a row have been heads. (c) Coins actually do have memories, and thus what comes up on the next toss is influenced by the past tosses. (d) The Law of Large Numbers tells how many tosses will be necessary before the percentages of heads and tails are again in balance. (e) None of the above are true statements. 20. Choose an American adult at random. The probability that you choose a woman is 0.52. The probability that the person you choose has never married is 0.25. The probability that you choose a woman who has never married is 0.11. The probability that the person you choose is either a woman or has never been married (or both) is therefore about (a) 0.77 (b) 0.66 (c) 0.44 (d) 0.38 (e) 0.13 21. A television game show has three payoffs with the following probabilities: Payoff ($) Probability 0 0.7 500 0.25 5,000 0.05 What are the mean and standard deviation of the payoff variable? (a) µ = 375, σ = 361 (b) µ = 375, σ = 1,083 (c) µ = 1,833, σ = 1,816 (d) µ = 1,833, σ = 2,248 (e) None of the above gives a set of correct answers. 22. A certain vending machine offers 20-ounce bottles of soda for $1.50. The number of bottles X bought from the machine on any day is a random variable with mean 50 and standard deviation 15. Let the random variable Y equal the total revenue from this machine on a given day. Assume that the machine works properly and that no sodas are stolen from the machine. What are the mean and standard deviation of Y? (a) µY = $1.50, σY = $22.50 (b) µY = $1.50, σY = $33.75 (c) µY = $75, σY = $18.37 (d) µY = $75, σY = $22.50 (e) µY = $75, σY = $33.75 23. Suppose the average outstanding loan for college graduates is $23,500 with a standard deviation of $7,200 and is normally distributed. A college graduate is randomly selected, what is the probability that his mean outstanding loan is under $21,000? (a) 0.0000 (b) 0.6368 (c) 0.6528 (d) 0.3632 (e) 0.3472 24. The number of hybrid cars a dealer sells weekly has the following probability distribution: Number of Hybrids Probability 0 0.32 1 0.28 2 0.15 3 0.11 4 0.08 5 0.06 The dealer purchases the cars for $21,000 and sells them for $24,500. What is the expected weekly profit from selling hybrid cars? (a) $2,380 (b) $3,500 (c) $5,355 (d) $8,109 (e) $37,485 Multiple Choice Answers: 1. E, 2. A, 3. C, 4. D, 5. E, 6. B, 7. B, 8. C, 9. E, 10. E, 11. A, 12. C, 13. B, 14. C, 15. B, 16. D, 17. C, 18. B, 19. E, 20. B, 21. B, 22. D, 23. D, 24. C III. Free Response. -- YOU MAY NEED TO DO THESE ON A SEPARATE SHEET OF PAPER! 25. A group of students wants to perform an experiment to determine whether Brand A or Brand B deodorant lasts longer. One group member suggests the following design: Recruit 40 student volunteers—20 male and 20 female. Separate by gender because male and female bodies might respond differently to deodorant. Give all the males Brand A deodorant and all the females Brand B. Have each student rate how well the deodorant is still working at the end of the school day on a 0 to 10 scale. Then compare ratings for the two treatments. (a) Identify any flaws you see in the proposed design for this experiment. (b) Describe how you would design the experiment. Explain how your design addresses each of the problems you identified in part (a). Make sure you use the four principles of experimental design: comparison, control, randomization, and replication. (c) What are the researchers measuring at the end? 26. A hotel has 30 floors with 40 rooms per floor. The rooms on one side of the hotel face the water, while rooms on the other side face a golf course. There is an extra charge for the rooms with a water view. The hotel manager wants to survey 120 guests who stayed at the hotel during a convention about their overall satisfaction with the property. (a) Explain why choosing a stratified random sample might be preferable to an SRS in this case. What would you use as strata? (b) Why might a cluster sample be a simpler option? What would you use as clusters? 27. (a) Which numerical summary is best to use if we have a skewed distribution? Why? (b) Which numerical summary is best to use if we have a symmetric distribution (free of outliers)? Why? 28. Here are the survival times in days of 72 guinea pigs after they were injected with infectious bacteria in a medical experiment. Survival times, whether of machines under stress or cancer patients after treatment, usually have distributions that are skewed to the right. 43 45 53 56 56 57 58 80 80 81 81 81 82 83 91 92 92 97 99 99 100 103 104 107 108 109 113 114 137 138 139 144 145 147 156 191 198 211 214 243 249 329 66 83 100 118 162 380 67 84 101 121 174 403 73 88 102 123 178 511 74 89 102 126 179 522 79 91 102 128 184 598 (a) Make a histogram of the data and describe its main features. Does it show the expected right skew? (b) Now make a boxplot of the data. Be sure to check for outliers. (c) Which measure of center and spread would you use to summarize the distribution – the mean and standard deviation or the median and IQR? Justify your answer. 29. (R1.4, p. 77) Is there a relationship between Facebook use and age among college students? The following two-way table displays data for the 219 students who responded to the survey. Facebook user? Yes No Younger (18 – 22) 78 4 Age Middle (23 – 27) 49 21 Older (28 and up) 21 46 (a) Find the marginal distribution for age. (b) Find the conditional distribution for age among Facebook users. Now, find the conditional distribution for age among non-Facebook users. Compare the distributions using a bar graph. 30. Rainwater was collected in water collectors at 30 different sites near an industrial complex, and the amount of acidity (pH level) was measured. The mean and standard deviation of the values are 4.60 and 1.10, respectively. When the pH meter was recalibrated back at the laboratory, it was found to be in error. The error can be corrected by adding 0.1 pH units to all of the values and then multiplying the result by 1.2. Calculate the correct pH measures. 31. The distribution of weights of 9-ounce bags of a particular brand of potato chips is approximately Normal with mean µ = 9.12 ounces and standard deviation σ = 0.05 ounce. Draw an accurate sketch of the distribution of potato chip bag weights. Be sure to label the mean, as well as the points 1, 2, and 3 standard deviations away from the mean on the horizontal axis. (a) Between what weights do the middle 68% of bags fall? (b) What percent of bags weigh less than 9.02 ounces? (c) What percent of 9-ounce bags of this brand of potato chips weigh between 8.97 and 9.17 ounces? (d) A bag that weighs 9.07 ounces is at what percentile in this distribution? 32. Good runners take more steps per second as they speed up. Here are the average numbers of steps per second for a group of top female runners at different speeds. The speeds are in feet per second. Speed (ft/s): 15.86 16.88 17.50 18.62 19.97 21.06 22.11 Steps per second: 3.05 3.12 3.17 3.25 3.36 3.46 3.55 (a) You want to predict steps per second from running speed. Make a scatterplot of the data with this goal in mind. (b) Describe the pattern of the data in context and find the correlation. (c) Find the least-squares regression line of steps per second on running speed. Draw this line on your scatterplot. (d) Does running speed explain most of the variation in the number of steps a runner takes per second? Calculate r2 and use it to answer this question. (e) Predict the steps per second for a runner who has a speed of 17.65 ft/s. (f) If you wanted to predict running speed from a runner’s steps per second, would you use the same line? Explain your answer. Would r2 stay the same? 33. Police report that 78% of drivers stopped on suspicion of drunk driving are given a breath test, 36% a blood test, and 22% both tests. (a) Using the General Addition Rule, find the probability that a randomly selected DWI suspect is given a blood test or a breath test. (b) Represent this situation in a two-way table. (c) Find the probability that a randomly selected DWI suspect is given: 1. P(either test) 2. P(only a blood test) 3. P(only a breath test) 4. P(neither test) (d) Are the tests independent? Use probability rules to support your answer. 34. A travel agent books passages on three different tours, with half her customers choosing tour one (T1), onethird choosing tour two (T2), and the rest choosing tour three (T3). The agent noted that three-quarters of those who take tour one return to book passage again, two-thirds of those who take tour two return, and one-half of those who take tour three return. If a customer does return, what is the probability that the person first went on tour two? (Use a tree diagram.) 35. Now that the new models are here, a car dealership has lowered prices on last years’ models. An energetic salesperson estimates the following probability distribution of X, the number of cars that she’ll sell next week. X P(X) 0 0.05 1 0.15 2 3 0.25 4 0.20 (a) Find the P(X=2). Fill it in the table. (b) Find and interpret the expected value and the standard deviation of X. (c) Suppose that this salesperson earns a fixed weekly wage of $300 plus a $2000 commission for each car sold. What are her expected weekly wages? (Let Y = weekly wages.) What is the standard deviation in her weekly wages? 36. You have two scales for measuring weights in a chemistry lab. Both scales give answers that vary a bit in repeated weighings of the same item. If the true weight of a compound is 2.00 grams (g), the first scale produces readings X that have a mean 2.000 g and standard deviation 0.002 g. The second scale’s readings Y have a mean 2.001 g and standard deviation 0.001 g. Find the probability that if an item is weighed on both scales, its reading from the first the first scale will be less than its reading from the second scale. Assume the scales are both Normally distributed and independent of each other.