AP Statistics Testbank 2 Name ____________________________________ Date _____________________ Period __________ A Few Formulas: sx = 2 1 n ( xi − x ) 2 ∑ n − 1 i −1 r= 1 n ⎛ xi − x ⎞⎛⎜ y i − ⎟ ∑⎜ n − 1 i −1 ⎜⎝ s x ⎟⎠⎜⎝ s y y ⎞⎟ ⎟ ⎠ ∧ y = a + bx, b = rs y sx , a = y − bx . Section 1: Multiple-Choice and Short Response Questions 1) A researcher is interested in determining if one could predict the score on a statistics exam from the amount of time spent studying for the exam. In this study, the explanatory variable is (circle the correct answer) a) b) c) d) e) the researcher. the amount of time spent studying for the exam. the score on the exam. the fact that this is a statistics exam. the likelihood of your passing Mr. S’s statistics course. 2) You are in the process of trying to determine if the score on a statistics examination can be predicted from the amount of time spent studying. In this study, which is the explanatory variable and which is the response variable? Explanatory variable: ___________________________________ Response variable: ____________________________________ 3) Suppose that you were to draw a scatterplot relating the heights and weights of mature adults in a particular ethnic tribe. Either determine what the explanatory variable and the response variable should be, or state that it really doesn’t matter which variable is called which. Your answer: ____________________________________________________________________ 4) Consider n data pairs ( x1 , y1 ), ( x 2 , y 2 ), ..., ( x n , y n ) . Assume that the mean of the x -values is x = 5 and that the sample deviation of the x -values is s x = 4 . Assume also that the mean of the y-values is y = 10 and that the sample deviation of the y-values is s y = 10 . Assume finally that the correlation of the data is given by r = 0.6 . Of the following, which could be the least squares regression line? (Circle one.) ∧ a) y = − 5.0 + 3.0 x ∧ b) y =1.5 x ∧ c) y = 2.5 + 1.5 x ∧ d) y = 2.5 − 1.5 x ∧ e) y = − 2.5 + 1.5 x 5) A study found a correlation of r = − 0.61 between the gender of a worker and his or her income. You may correctly conclude (circle the correct answer) a) b) c) d) that the study is flawed; correlation makes no sense in this context. that the study shows that women typically earn less than men. that the study shows that the greater the salary, the greater the tendency to be a man. that the study is flawed; only a positive correlation would be possible in this situation. 6) Suppose that we have 10 data pairs ( x1 , y1 ), ( x 2 , y 2 ), ..., ( x10 , y10 ) with correlation r = − 0.61 . Then the correlation of the new set of data (2 x1 + 1, y1 ), (2 x 2 + 1, y 2 ), ..., (2 x10 + 1, y10 ) has value a) 0.39 b) − 0.22 c) – 0.61 d) 0.61 7) We measure a response variable Y at each of several times. The resulting scatter plot of logY versus time of measurement looks approximately like a positively sloping straight line. We may conclude that a) the correlation between time of measurement and Y is negative, since logarithms of positive fractions (such as correlations) are negative. b) the rate of growth of Y is positive, but slowing down over time. c) an exponential curve would approximately describe the relationship between Y and time of measurement. d) a mistake has been made. It would have been better to plot Y versus the logarithm of the time of measurement. 8) A researcher wishes to study how the average weight Y (in kilograms) of children changes during the first year of life. He plots these averages versus the age X (in months) and decides to fit a least-squares regression line to the data with X as the explanatory variable and Y as the response variable. He computes the following quantities. r = correlation between X and Y = 0.9 x = mean of the values of X = 6.5 y = mean of the values of Y = 6.6 s x = standard deviation of the values of X = 3.6 s y = standard deviation of the values of Y = 1.2 The least-squares regression line is has equation (circle the correct answer): ∧ a) y = 4.65 + 0.3 x ∧ b) y = 4.65 − 0.3 x ∧ c) y = 0.3 + 4.65 x ∧ d) y = 4.65 + 2.7 x ∧ e) y = 2.7 + 4.65 x 9) Using least-squares regression, it is determined that the logarithm (base 10) of the population of a country is approximately described by the equation log(population) = –13.5 + 0.01x where x is the year. Based on this equation, the population of the country in the year 2000 should be about a) b) c) d) 7.5 665 2,000,000 3,167,277 10) Assume that the scatterplot of (log x, log y ) appears linear. Then the scatterplot of ( x, y ) will look a) linear b) quadratic c) exponential d) logarithmic. e) No discernable pattern will be recognizable. 11) The following is a two-way table describing the age and marital status of American women in 1995. The table entries are in thousands of women. Marital status Never Age (years) married Married Widowed Divorced Total 18–24 9,289 3,046 19 260 12,614 25–39 6,948 21,437 206 3,408 31,999 40–64 2,307 26,679 2,219 5,508 36,713 ≥ 65 768 7,767 8,636 1,091 18,262 Total 19,312 58,929 11,080 10,267 99,588 What percentage of the women aged 25–39 have never married? a) 48% b) 22% c) 12% d) 36% Section 2: Free-Response Questions ∧ 12) Suppose that you have 20 data pairs, all of which lie on the straight line whose equation is y = −1.2 x + 40 . Compute the correlation of these data. Ans: ______________________________ 13) Suppose that you have data pairs ( x1 , y ), ( x 2 , y 2 ), K , ( x n , y n ) with correlation r = − 0.85 and whose ∧ regression line has equation y = 4.7 − 2.1 x . Compute s y given that s x = 2.8 . Ans: ______________________________ ∧ 14) Suppose that you have 20 data pairs, all of which lie on the straight line whose equation is y = 1.2 x + 21.3 . Assume that the mean of the 20 x -values is x = 4.3 with sample deviation s x = 1.8 . Compute the mean and sample deviation of the 20 y -values. y = _________________ s y = _________________ 15) John's parents recorded his height at various ages up to 66 months. Below is a record of the result Age (months) 36 48 54 60 66 Height (inches) 35 38 41 43 45 a) Compute the correlation coefficient of these data, using age as the explanatory variable. Ans: ______________________________ b) Compute the equation of the regression line. Ans: ______________________________ c) Use the regression line to extrapolate John’s age at age 6 years. Ans: ______________________________ 16) The scatterplot below plots the city miles per gallon on the horizontal axis versus the highway miles per gallon on the vertical axis for 17 automobiles. Highway miles per gallon City miles per gallon a) On the graph above, sketch the regression line that best fits these data. b) Suppose that the actual data resulting in the above scatterplot is {(11, 14), (14, 17), (16, 18), (16,17), (17, 18), (17, 19), (17, 17), (19, 21), (19, 20), (20, 21), (20, 22), (21, 19), (21, 22), (24, 21), (25, 28), (28, 29), (29,31)} i) Use your calculator to compute the equation of the regression line. Ans: ______________________________ ii) Use your calculator to compute the correlation. Ans: _______________________________ 17) Suppose that the scatterplot of log y versus x revealed close to a linear relationship with regression equation log y = 12.3 − 3.2 x . Give a prediction of the response variable y given x = 2.1 . Ans: _______________________________ 18) Draw an example of a scatterplot (containing at least 10 data pairs) whose correlation r is negative and such that | r | is fairly close to one. 19) Below is a scatterplot together with a particular point indicated, marked “x.” a) Directly on the graph above, sketch a possible line of regression for the data. b) Directly on the graph above, sketch a possible line of regression through the data with the point x excluded. c) Is the point marked “x” influencial? In addition to answering “yes” or “no,” explain what this means. 20) Animal-waste lagoons and spray fields near aquatic environments may significantly degrade water quality and endanger health. The National Atmospheric Deposition Program has monitored the atmospheric ammonia at swine farms since 1978. The dats on the swine population size (in thousands) and atmospheric ammonia (in parts per million) for one decade are given below. Year Swine Population Atmospheric Ammonia a) 1988 0.38 1989 0.50 1990 0.60 1991 0.75 1992 0.95 1993 1.20 1994 1.40 1995 1.65 1996 1.80 1997 1.85 0.13 0.21 0.29 0.22 0.19 0.26 0.36 0.37 0.33 0.38 Construct a scatterplot for these data. Be sure to label the axes and give the units of measurement. b) Compute the correlation coefficient. Ans: ______________________________ c) Compute the equation of the regression line. Ans: ______________________________ d) Based on the data (and your work), does it appear that the amount of atmospheric ammonia is linearly related to the swine population size? Ans: ______________________________ e) What percent of the variability of the atmospheric ammonia can be explained by the swine population size? Ans: ______________________________ 21) The following two-way table is extracted from Moore and McCabe’s Introduction to the Practice of Statistics. Years of School Completed, by Age Age group Education 25 to 34 35 to 54 55 and over Total Did not complete high school 5,325 9,152 16,035 30,512 Completed only high school 14,061 24,070 18,320 56,451 College 1 to 3 years 11,659 19,926 9,662 41,247 College graduate 10,342 19,878 8,005 38,225 Total 41,388 73,028 52,022 166,438 a) What are the two categorical variables in this study? Ans: _______________________________________________________________________ b) Which age group has the highest percentage of college graduates? What is this percentage? Ans: _______________________________________________________________________ c) Plot a bar graph for the marginal distribution of levels of education. Relative frequency 22) Commercial airlines need to know the operating cost per hour of flight for each plane in their fleet. In a study of the relationship between operating cost per hour and the number of passenger seats, investigators computed the regression of operating cost per hour on the number of passenger seats. The 12 sample aircraft used in the study included planes with as few as 216 passenger seats and planes with as many as 410 passenger seats. Operating cost per hour ranged between $3,600 and $7,800. Some computer output from the regression analysis of these data is shown below. Operating Cost per Hour ($1000s) Number of Passenger Seats Predictor Constant Seats Coef 1136 14.673 S = 845.3 R-Sq = 57.0% StDev 1226 4.027 T 0.93 3.64 P 0.376 0.005 R-Sq (adj) = 52.7% a) What is the equation of the least squares regression line that describes the relationship between operating cost per hour and the number of passenger seats in the plane? Define any variables used in this equation. Also, sketch this regression line on the graph above. Ans: __________________________________________________ b) What is the value of the correlation coefficient for operating cost per hour and the number of passenger seats in the plane? Interpret this correlation. Correlation:_______________________ c) Suppose that you want to describe the relationship between operating cost per hour and the number of passenger seats in the plane for planes only in the range of 250 to350 seats. Does the line shown in the scatterplot still provide the best description of the relationship for data in this range? Why or why not? 23) Two pain relievers, A and B, are being compared for relief of postsurgical pain. Twenty different strengths (doses in milligram) of each drug were tested. Eight hundred postsurgical patients were randomly divided into 40 different groups. Twenty groups were given drug A. Each group was given a different strength. Similarly, the other twenty groups were given different strengths of drug B. Strengths used ranged from 210 to 400 milligrams. Thirty minutes after receiving the drug, each patient was asked to describe his or her pain relief on a scale of 0 (no decrease in pain) to 100 (pain totally gone). The strength of the drug given in milligrams and the average pain rating for each group are shown in the scatterplot below. Drug A is indicated with a's and drug B with b's. Pain Relief Strength (in milligrams) a) Based on the scatterplot, describe the effect of drug A and how it is related to strength in milligrams. b) Based on the scatterplot, describe the effect of drug B and how it is related to strength in milligrams. c) Which drug would you give and at what strength, if the goal is to get pain relief of at least 50 at the lowest possible strength? Justify your answer based on the scatterplot. 24) Lydia and Bob were searching the Internet to find information on air travel in the United States. They found data on the number of commercial aircraft flying in the United States suring the years 1990–1998. The dates were recorded as years since 1990. Thus, the year 1990 was recorded as year 0. They fit a least-squares regression line to the data. The graph of the residuals and part of the computer output for their regression are given below. Residuals Years Since 1990 Predictor Constant Years Coef 2939.93 233.517 Stdev 20.55 4.316 t-ratio 143.09 54.11 p 0.000 0.000 s = 33.43 a) Is a line an appropriate model to use for these data? What information tells you this? b) What is the value of the slope of the least-squares regression line? Interpret the slope in the context of this situation. c) What is the value of the intercept of the least-squares regression line? Interpret the slope in the context of this situation. d) What is the predicted number of commercial aircraft flying in 1992? e) What was the actual number of commercial aircraft flying in 1992? 25) A simple random sample of 9 students was collected from a large university. Each of these students reported the number of hours he or she had allocated to studying and the number of hours allocated to work each week. A least squares linear regression was performed and part of the resulting computer output is shown below. Dependent variable is: percent kelled R squared = 97.2% R squared (adjusted) = 96.9% s = 4.505 with 14-2 = 12 degrees of freedom Source Regression Residual Sum of Squares 8330.16 243.589 Variable Coefficient Constant –20.5893 NO. Teaspoons 24.3929 df 1 12 s.e. of Coeff 3.242 1.204 Mean Square 8330.16 20.2990 t-ratio –6.35 20.3 F-ratio 410 Prob ≤ 0.0001 ≤ 0.0001 Residuals Predicted a) What is the equation of the least-squares regression line given by this analysis? Define any variables used in this equation. b) If someone uses this equation to predict the percentage of weeds killed when 2.6 teaspoons of weed killer are used, which of the following would you expect? (Check one) The prediction will be too large. The prediction will be too small. A prediction cannot be made based on the information given on the computer output. Explain your reasoning. 26) A simple random sample of 9 students was collected from a large university. Each of these students reported the number of hours he or she had allocated to studying and the number of hours allocated to work each week. A least squares linear regression was performed and part of the resulting computer output is shown below. Predictor Constant Work Coef 8.107 0.4919 StDev 2.731 0.1950 S = 4.349 R-Sq = 47.6% R-Sq (adj) = 40.1% T 2.97 2.52 P 0.021 0.040 The scatterplot below displays the data that were collected from the 9 students. a) After point P, labeled on the graph above, was removed from the data, a second linear regression was performed and the computer output is shown below. Predictor Constant Work Coef 11.123 0.1500 StDev 3.986 0.3834 S = 4.327 R-Sq = 2.5% R-Sq (adj) = 0.0% T 2.79 0.39 Does the point P exercise a large influence on the regression line? Explain. P 0.032 0.709 b) The researcher who conducted the study discovered that the number of hours spent studying reported by the student represented by P was recorded incorrectly. The corrected data point for this student is represented by the letter Q in the scatterplot below. Explain how the least squares regression line for the corrected data (in this part) would differ from the least squares regression line for the original data.