Chapter 3 Review KEY 1. List three synonyms for the x-variable Explanatory, Independent, Control, Predictor, Input. 2. List three synonyms for the y-variable Response, Dependent, Output 3. r is called: Correlation Coefficient 4. r2 is called: Coefficient of Determination 5. What information must be included in a description of the relationship between two variables? Form (overall pattern), Direction, Strength 6. LSRL stands for:_Least-Squares Regression Line 7. When calculating the equation of the LSRL from summary statistics, what information is needed? r, Sy, Sx, x-bar, y-bar. 8. Calculate an r from the following points by hand. At any stage where you may need to round, round to the nearest tenth. r = .5 using (0,0), (1, 2), (2, 1) 9. Describe in words what a residual represents. It's the difference between the observed y and the y predicted by the regression line. 10. In a regression line, the sum of all residuals = 0 11. Explain the meaning of r-squared. It's the proportion of variability in the response variable explained by the linear regression of the response variable onto the explanatory variable. 12. Explain why changing units of measure has no effect on r. No effect because changing units constitutes a linear transformation. Linear transformations don't affect r. OR— better—Because r is based on standardized values, and because converting numbers into zscores absorbs any units of measure, r doesn't care about units of measure one way or another. 13. A point is removed from a scatterplot. The point is very close to the regression line, but way out in the "outfield." Will r increase or decrease? Decrease 14. A point is removed from a scatterplot, but in this case the point is offset from the regression line by a considerable distance. Will r increase or decrease? Increase. 15. A point is smack dab in the middle of the bunch, and it lies right on the regression line. If you remove it, will r go up or down? Up—It contributes little to the sum of products of standardized x & y, but it does add one to the denominator. Removing it doesn't change the numerator, but denominator goes down. Ergo—fraction goes up. 15. In a residual plot, what is the preferred pattern if you hope to have faith in how well a regression line can be used for prediction all along the domain? No pattern at all. 16. You have a point (15, 18) found in some data whose regression line is: y = 14 — 3jc Find the residual for this point. y-y = 49 (Check my work—I was going quickly) 17. You're given this data: (2, 3), (4, 7), and (-3, -1). In under 30 seconds, without a calculator, find one other point on the regression line for this data. (1, 3) 18. You look at a residual plot and see that residuals begin positive, then go negative, then pass to positive. What do you conclude? A line is not the right regression (or model) to use. 19. You're given this info: r = .5, x = 4, y = 8 , Sx = .3, Sy = .8 Find the residual for the point (1,3) -1 20. You're performing a study relating years of driving and accident incidence during the first five years of driving. Name a lurking variable that might operate through common effect. (Bad question. Tough to find a variable that would affect time :) Name a lurking variable that might be a confound not operating through common effect Type of car, age of driver, climate in which driving occurs—many choices... 21. You're performing a study on children and believe that the ingestion of melamine in milk has a deleterious effect on growth. For obvious reasons, you can't perform an experiment. Nonetheless, you wish to establish a strong case for melamine "causing" stunted growth. List four of the criteria you would have to satisfy in order to begin building a case for causality, in the context of this situation. Multiple studies showing the same relationship between melamine ingestion and growth, melamine ingestion precedes growth stunting in time, Effect on growth is proportional to amount of malamine ingested (more melamine, less growth), and a strong r-value between melamine amount and growth "loss" 22. You're using family income (in dollars) to predict the size of homes (in square feet). You find an r value of .43. In your follow-on study, you wish to convert your study for a European audience. There are 1.2 Euros per dollar, and 10.1 square feet per square meter. Find your new r value, r = .43—no change! Linear Transformations do not affect r. Practice Multiple Choice (Correct Answer is in Bold) 1. In a statistics course, a linear regression equation was computed to predict the final exam score from the score on the first test. The equation was y = 10 + ,9x where y is the final exam score and x is the score on the first test. Carla scored 95 on the first test. What is the predicted value of her score on the final exam? (a) 95 (b) 85.5 (c) 90 (d) 95.5 (e) None of the above 2. Refer to the previous problem. On the final exam Carla scored 98. What is the value of her residual? P2=L1,RESID (a) 98 (b) 2.5 (c) -2.5 (d) 0 (e) None of the above =10 3. A study of the fuel economy for various automobiles plotted the Y=10.00lfifi7 fuel consumption (in liters of gasoline used per 100 kilometers traveled) vs. speed (in kilometers per hour). A least squares line was fit to the data. Here is the residual plot from this least squares fit. What does the pattern of the residuals tell you about the linear model? (a) The evidence is inconclusive. (b) The residual plot confirms the linearity of the fuel economy data. (c) The residual plot does not confirm the linearity of the data. (d) The residual plot clearly contradicts the linearity of the data. (e) None of the above 4. All but one of the following statements contains a blunder. Which statement is correct? (a) There is a correlation of 0.54 between the position a football player plays and their weight. (b) The correlation between planting rate and yield of corn was found to be r=0.23. (c) The correlation between the gas mileage of a car and its weight is r=0.71 MPG. (d) We found a high correlation (r=l .09) between the height and age of children. (e) We found a correlation of r=-.63 between gender and political party preference. 5. After a linear regression, it was found that the r-value was .65. If each x-value were decreased by one unit and the y-values remained the same, then the correlation r would (a) Decrease by 1 unit (b) Decease slightly (c) Increase slightly (d) Stay the same (e) Can't tell without knowing the data values 6. In regression, the residuals are which of the following? (a) Those factors unexplained by the data (b) The difference between the observed responses and the values predicted by the regression line (c) Those data points which were recorded after the formal investigation was completed (d) Possible models unexplored by the investigator (e) None of the above 7. What does the square of the correlation (r2) measure? (a) The slope of the least squares regression line (b) The intercept of the least squares regression line (c) The extent to which cause and effect is present in the data (d) The fraction of the variation in the values of y that is explained by least-squares regression of y on the other variable. 8. Which of the following statements are true? I. Correlation requires one variable to be identified as the explanatory variable and other as the response variable. II. A two-variable scatterplot requires that both variables be quantitative. III. Every least-square regression line passes through ( i , y). (a) I and II only (b) I and III only (c) II and III only (d) I, II, and III (e) None of the above. 9. A local community college announces the correlation between college entrance exam grades and scholastic achievement was found to be -1.08. On the basis of this you would tell the college that (a) The entrance exam is a good predictor of success. (b) The exam is a poor predictor of success. (c) Students who do best on this exam will be poor students. (d) Students at this school are underachieving. (e) The college should hire a new statistician. 10. The following are resistant: (a) Least squares regression line (b) Correlation coefficient (c) Both the least square line and the correlation coefficient (d) Neither the least square line nor the correlation coefficient (e) It depends 11. A study found correlation r = 0.61 between the sex of a worker and his or her income. You conclude that: (a) Women earn more than men on the average. (b) Women earn less than men on average. (c) An arithmetic mistake was made; this is not a possible value of r. (d) This is nonsense because r makes no sense here. 12. A copy machine dealer has data on the number x of copy machines at each of 89 customer locations and the number y of service calls in a month at each location. Summary calculations give x = 8.4, S^. = 2.1, y =14.2,S y =3.8, and r = 0.86. What is the slope of the least squares regression line of number of service calls on number of copiers? (a) 0.86 (b) 1.56 (c) 0.48 (d) None of these (e) Can't tell from the information given 13. In the setting of the previous problem, about what percent of the variation in the number of service calls is explained by the linear relation between number of service calls and number of machines? (a) 86% (b) 93% (c) 74% (d) None of these (e) Can't tell from the information given 14. If dataset A of (x,y) data has correlation coefficient r = 0.65, and a second dataset B has correlation r = -0.65, then (a) The points in A exhibit a stronger linear association than B. (b) The points in B exhibit a stronger linear association than A. (c) Neither A nor B has a stronger linear association. (d) You can't tell which dataset has a stronger linear association without seeing the data or seeing the scatterplots. 15. There is a linear relationship between the number of chirps made by the striped ground cricket and the air temperature. A least squares fit of some data collected by a biologist gives the model y = 25.2 + 3.3x, 9 < x < 25, where x is the number of chirps per minute and y is the estimated temperature in degrees Fahrenheit. What is the estimated increase in temperature that corresponds to an increase in 5 chirps per minute? (a) 3.3°F (b) 16.5°F (c) 25.2°F (d) 28.5°F (e) 41.7°F 16. Linear regression usually employs the method of least squares. Which of the following is the quantity that is minimized by the least squares process? (a) y( b ) xr*> y / _ —\ (c) i y i 2 (c) is the correct answer, once you've changed bars to hats. (d) (e) 17. A set of data relates the amount of annual salary raise and the performance rating. The least squares regression equation is y = 1,400 + 2,000x where y is the estimated raise and x is the performance rating. Which of the following statements is not correct? (a) For each increase of one point in performance rating, the raise will increase on average by $2,000. (b) This equation produces predicted raises with an average residual of 0. (c) A rating of 0 will yield a predicted raise of $ 1,400. (d) The correlation for the data is positive. (e) All of the above are true. 18. Which of the following would not be a correct interpretation of a correlation of r = -.30? (a) The variables are inversely related. (b) The coefficient of determination is 0.09. (c) 30% of the variation between the variables is linear, (yikes!) (d) There exists a weak relationship between the variables. (e) All of the above statements are correct. 19. If removing an observation from a data set would have a marked change on the position of the LSRL fit to the data, what is the point called: (a) Robust (b) A residual (c) A response (d) Influential (e) None of the above