Regression BPS chapter 5 © 2006 W.H. Freeman and Company Sum of squared errors Which least-squares regression line would have a smaller sum of squared errors (SSE)? a) b) c) The line in Plot A. The line in Plot B. It would be the same for both plots. Sum of squared errors (answer) Which least-squares regression line would have a smaller sum of squared errors (SSE)? a) b) c) The line in Plot A. The line in Plot B. It would be the same for both plots. Scatterplots Look at the following scatterplot. What could we say about the relationship between r and the slope of the regression line? a) b) c) d) Since a is negative, r must also be negative. Since b is negative, r must also be negative. Since a is positive, r must also be positive. Since b is positive, r must also be positive. Scatterplots (answer) Look at the following scatterplot. What could we say about the relationship between r and the slope of the regression line? a) b) c) d) Since a is negative, r must also be negative. Since b is negative, r must also be negative. Since a is positive, r must also be positive. Since b is positive, r must also be positive. Slope Look at the following scatterplot. What would be a correct interpretation of the slope? a) b) c) d) As we increase our CO content by 1 mg, we increase the tar content by 1.01 mg. As we increase our CO content by 0.66 mg, we increase the tar content by 1.01 mg. As we increase our CO content by 0.66 mg, we increase the tar content by 0.66 mg. As we increase our CO content by 1 mg, we increase the tar content by 0.66 mg. Slope (answer) Look at the following scatterplot. What would be a correct interpretation of the slope? a) b) c) d) As we increase our CO content by 1 mg, we increase the tar content by 1.01 mg. As we increase our CO content by 0.66 mg, we increase the tar content by 1.01 mg. As we increase our CO content by 0.66 mg, we increase the tar content by 0.66 mg. As we increase our CO content by 1 mg, we increase the tar content by 0.66 mg. Regression line Look at the following least-squares regression line. If a person increased his/her weight by 10 pounds, by how much (in inches) would one expect to see their waist girth increase? a) b) c) d) 0.1332 1.332 99.994 1.332 + 9.9994 Regression line (answer) Look at the following least-squares regression line. If a person increased his/her weight by 10 pounds, by how much (in inches) would one expect to see their waist girth increase? a) b) c) d) 0.1332 1.332 99.994 1.332 + 9.9994 Regression line Look at the following least-squares regression line. The Y-intercept tells us the predicted waist girth for someone weighing how many pounds? a) b) c) d) 0 0.1332 9.9994 Cannot be determined from the graph. Regression line (answer) Look at the following least-squares regression line. The Y-intercept tells us the predicted waist girth for someone weighing how many pounds? a) b) c) d) 0 0.1332 9.9994 Cannot be determined from the graph. Residuals Look at the following least-squares regression line. Compare the squared errors (residuals) from the two Points A and B. a) b) c) d) Point A’s would be greater than Point B’s. Point A’s would be less than Point B’s. Point A’s would be equal to Point B’s. There is not enough information. Residuals (answer) Look at the following least-squares regression line. Compare the squared errors (residuals) from the two Points A and B. a) b) c) d) Point A’s would be greater than Point B’s. Point A’s would be less than Point B’s. Point A’s would be equal to Point B’s. There is not enough information. Percent of variation in Y What percent of the variation in the sisters’ heights can be explained by the heights of the brothers? a) b) c) d) 25.64% (0.558)2 = 31.14% 52.7% 55.8% Percent of variation in Y (answer) What percent of the variation in the sisters’ heights can be explained by the heights of the brothers? a) b) c) d) 25.64% (0.558)2 = 31.14% 52.7% 55.8% Correlation The correlation between math SAT score and total SAT score is about r = 0.9935. What is a correct conclusion that could be made? a) b) c) d) The least-squares regression line of Y on X would have slope = 0.9935. Math SAT scores explain about 98.7% (which is 0.99352) of the variation in the total SAT scores. About 99.35% of the time math SAT scores will accurately predict total SAT scores. Total SAT score is made up of 99.35% of the math SAT score. Correlation (answer) The correlation between math SAT score and total SAT score is about r = 0.9935. What is a correct conclusion that could be made? a) b) c) d) The least-squares regression line of Y on X would have slope = 0.9935. Math SAT scores explain about 98.7% (which is 0.99352) of the variation in the total SAT scores. About 99.35% of the time math SAT scores will accurately predict total SAT scores. Total SAT score is made up of 99.35% of the math SAT score. Residuals Residual equals a) b) c) d) Residuals (answer) Residual equals a) b) c) d) Residual plots Residual plots are used to a) b) c) d) Examine the relationship between two variables. Identify the mean and spread of the residuals. Check for independence of observations. Magnify violations of regression assumptions. Residual plots (answer) Residual plots are used to a) b) c) d) Examine the relationship between two variables. Identify the mean and spread of the residuals. Check for independence of observations. Magnify violations of regression assumptions. Residual plots The following are regression assumptions: 1. The relationship between X and Y can be modeled with a straight line. 2. The variation in the Y values does not depend on the value of X (constant variance). The residual plot shown below indicates the violation of which regression assumption? a) b) c) 1 2 Neither Residual plots (answer) The following are regression assumptions: 1. The relationship between X and Y can be modeled with a straight line. 2. The variation in the Y values does not depend on the value of X (constant variance). The residual plot shown below indicates the violation of which regression assumption? a) b) c) 1 2 Neither Residual plots The following are regression assumptions: 1. The relationship between X and Y can be modeled with a straight line. 2. The variation in the Y values does not depend on the value of X (constant variance). The residual plot shown below indicates the violation of which regression assumption? a) b) c) 1 2 Neither Residual plots (answer) The following are regression assumptions: 1. The relationship between X and Y can be modeled with a straight line. 2. The variation in the Y values does not depend on the value of X (constant variance). The residual plot shown below indicates the violation of which regression assumption? a) b) c) 1 2 Neither Correlation or regression Which of the following measures the direction and strength of the linear association between X and Y? a) b) Correlation Regression Correlation or regression (answer) Which of the following measures the direction and strength of the linear association between X and Y? a) b) Correlation Regression Correlation or regression Which of the following makes no distinction between explanatory and response variables? a) b) Correlation Regression Correlation or regression (answer) Which of the following makes no distinction between explanatory and response variables? a) b) Correlation Regression Correlation or regression Which of the following is used for prediction? a) b) Correlation Regression Correlation or regression (answer) Which of the following is used for prediction? a) b) Correlation Regression Regression line A regression line always passes through the point a) b) c) d) Regression line (answer) A regression line always passes through the point a) b) c) d) Correlation and slope Which of the following best describes the relationship between correlation and slope? a) b) c) d) The correlation of X and Y equals the slope of the regression line modeling the relationship between X and Y. When the correlation between X and Y is zero, the slope of the regression line modeling the relationship between X and Y is negative. The sign of the correlation between X and Y is the same as the sign of the slope of the regression line modeling the relationship between X and Y. The correlation between X and Y is not related to the slope of the regression line modeling the relationship between X and Y. Correlation and slope (answer) Which of the following best describes the relationship between correlation and slope? a) b) c) d) The correlation of X and Y equals the slope of the regression line modeling the relationship between X and Y. When the correlation between X and Y is zero, the slope of the regression line modeling the relationship between X and Y is negative. The sign of the correlation between X and Y is the same as the sign of the slope of the regression line modeling the relationship between X and Y. The correlation between X and Y is not related to the slope of the regression line modeling the relationship between X and Y. Regression line Which of the following best measures the strength of fit of a regression line? a) b) c) Correlation coefficient, r. Square of the correlation coefficient, r2. Square root of the correlation coefficient, r . Regression line (answer) Which of the following best measures the strength of fit of a regression line? a) b) c) Correlation coefficient, r. Square of the correlation coefficient, r2. Square root of the correlation coefficient, r . Causation Researchers interviewed a group of women with knee pain awaiting knee replacement surgery. They also interviewed a group of women from the same geographical area with no knee pain. These researchers reported that wearing high-heeled shoes caused the knee pain which required surgery. As a savvy consumer of statistics, you conclude that: a) b) Because this was only an observational study, the researchers should not make claims that the knee pain was caused by high heels. Because the study was a valid experiment, the researchers were valid in their claim about high heels causing pain. Causation (answer) Researchers interviewed a group of women with knee pain awaiting knee replacement surgery. They also interviewed a group of women from the same geographical area with no knee pain. These researchers reported that wearing high-heeled shoes caused the knee pain which required surgery. As a savvy consumer of statistics, you conclude that: a) b) Because this was only an observational study, the researchers should not make claims that the knee pain was caused by high heels. Because the study was a valid experiment, the researchers were valid in their claim about high heels causing pain. Linear regression The following graph indicates the presence of a) b) c) Extrapolation. An influential observation. A lurking variable. Linear regression (answer) The following graph indicates the presence of a) b) c) Extrapolation. An influential observation. A lurking variable. Linear regression The following graph shows the linear relationship between diamond size and price for diamonds size 0.35 carats or less. Using this relationship to predict the price of a diamond that is 1 carat is considered a) b) c) Extrapolation. An influential observation. Prediction. Linear regression (answer) The following graph shows the linear relationship between diamond size and price for diamonds size 0.35 carats or less. Using this relationship to predict the price of a diamond that is 1 carat is considered a) b) c) Extrapolation. An influential observation. Prediction. Linear regression The diamonds mentioned in the previous question were of the same cut and clarity. If diamonds of different cuts have different relationship between size and price, we would say that a) b) c) Type of cut is a lurking variable. Type of cut is a confounded variable. Type of cut should be ignored. Linear regression (answer) The diamonds mentioned in the previous question were of the same cut and clarity. If diamonds of different cuts have different relationship between size and price, we would say that a) b) c) Type of cut is a lurking variable. Type of cut is a confounded variable. Type of cut should be ignored.