PROBLEM SET #2 1. In simple linear regression analysis, the dependent variable a) is the variable that changes in response to changes in an independent variable. b) is the variable whose changes affect the dependent variable. c) is on the right-hand side variable. d) can be either variable. 2. In simple linear regression analysis, the independent variable is a) the variable that changes in response to changes in an independent variable. b) the variable whose changes affect the dependent variable. c) the left-hand side variable. d) can be either variable. 3. The predicted value of 𝑦𝑖 is a) the value that 𝑦𝑖 takes on when 𝑥𝑖 equals 0. b) the effect that a one-unit change in the dependent variable is expected to have on the independent variable, holding all else constant. c) the value of 𝑦𝑖 when the slope is multiplied a specific 𝑥𝑖 and then that value is added to the intercept. d) the observed value of the dependent variable that is associated with a specific value of the independent variable. 4. The error term includes all of the following except a) omitted variables. b) deterministic relationships. c) incorrect functional form. d) measurement error. 5. The estimated slope coefficient is a) the estimated marginal effect of 𝑥 on 𝑦. b) the estimated value of 𝑦 when 𝑥 equals 0. c) equal to the population slope coefficient. d) 𝛽̂0. 6. The residual is a) the difference between the observed value of the dependent variable and the observed value of the independent variable. b) the difference between the observed value of the dependent variable and the predicted value of the dependent variable. c) the difference between the predicted value of the dependent variable and the predicted value of the independent variable. d) the difference between the predicted value of the independent variable and the observed value of the independent variable 7. Suppose you determine the estimated sample regression function to be 𝑦̂𝑖 = 1,016.82 + 473.65 ∙ 𝑥𝑖 . You would conclude that a) 𝑦𝑖 is estimated to equal 473.65 when 𝑥𝑖 = 0. b) 𝑦𝑖 is estimated to increase by 473.65 for every one unit increase in 𝑥𝑖 . c) 𝑦𝑖 is estimated to decrease by 473.65 for every one unit increase in 𝑥𝑖 . d) 𝑦𝑖 is estimated to increase by 1,016.82 for every one unit increase in 𝑥𝑖 . 8. The term “goodness-of-fit” refers to a) the accuracy of the estimated sample regression function. b) whether or not the estimated sample regression function is correct. c) method by which we determine the best-fit line. d) the extent to which observed data match the values expected by theory. 9. The explained variation in 𝑦 is a) the distance between the best-fit line and the data points. b) the distance between the mean and the predicted value of 𝑦. c) the distance between the mean and the data points. d) the distance between the observed and predicted values of y. 10. The unexplained variation in 𝑦 is a) the distance between the observed and predicted values of 𝑦. b) the distance between the mean and the best-fit line. c) the distance between the mean and the data points. d) the distance between the mean and the predicted value of 𝑦. 11. The coefficient of determination (𝑅 2 ) is a) the ratio of the unexplained variation in 𝑦 to the total variation in 𝑦. b) the ratio of the explained variation in 𝑦 to the total variation in 𝑦. c) the ratio of the explained variation in 𝑦 to the unexplained variation in 𝑦. d) the ratio of the unexplained variation in 𝑦 to the explained variation in 𝑦. 12. In general, a larger 𝑅 2 tends to suggest that a) the estimated sample regression function explains a greater percentage of the total variation in 𝑦. b) the estimated sample regression function is more accurate. c) the estimated sample regression function explains a greater percentage of the explained variation in 𝑦 d) the estimated slope coefficient is more likely to equal the population slope coefficient. 13. The standard error of the estimated sample regression function (𝑠𝑦|𝑥 ) is a) the square root of the unexplained sum of squares. b) the square root of the explained sum of squares. c) the square root of the unexplained sum of squares divided by the degree of freedom of the regression. d) the square root of the explained sum of squares divided by the degree of freedom 14. In general, a larger 𝑠𝑦|𝑥 tends to suggest that a) the estimated sample regression function explains a greater percentage of the total variation in 𝑦 b) the estimated sample regression function is more accurate. c) the data points fall closer to the best-fit line d) the data points fall further from the best-fit line. 15. Suppose you wish to determine the degree to which annual earnings of PGA tour players (Earnings) are related to driving distance (Yards Per Drive). In such a case, a) you should define Yards Per Drive as the dependent variable. b) you should define Earnings as the dependent variable. c) it does not matter which variable you define as the dependent variable. d) you should define Earnings as the independent variable. 16. Suppose you are given the Excel output in Figure 4.1. You would conclude that each additional Yard Per Drive is estimated to be associated with a) a $7,773,135.558 decrease in annual earnings. b) a $2,867,254.773 increase in annual earnings. c) a $30,737.523 increase in annual earnings. d) a $9,823.548 increase in annual earnings. 17. Suppose you are given the Excel output in 4.1. You would conclude that the estimated sample regression function explains a) 21.65 percent of the total variation in annual earnings. b) 4.69 percent of the total variation in annual earnings. c) 4.21 percent of the total variation in annual earnings. d) 0.02 percent of the total variation in annual earnings 18. Suppose you are given the Excel output in 4.1. You would conclude that the standard error of the estimated sample regression function (𝑠𝑦|𝑥 ) is a) 0.2165. b) 0.0469. c) 1,155,621.367. d) 201. 19. Suppose you are given the Excel output in 4.1. You would conclude that explained sum of squares is a) 1.307𝐸 + 13. b) 2.657E+14 c) .1.200E+13 d) .you cannot tell from the information given 20. Suppose you are given the Excel output in 4.1. You would conclude that the number of golfers in the sample is a) 1. b) 199. c) 200. d) 201. 21. Suppose you are given the Excel output in 4.1. You would conclude that number of degrees of freedom of the regression is a) 1. b) 199. c) 200. d) 201. Calculations 1. A counselor working with teenagers is interested in the relationship between anxiety and depression. The counselor administers a depression and anxiety test to each teenagers selected randomly. The scores obtained from the administration of the two inventories are summaarized below. The summary statistics are Anxiety Sample mean Standard deviation 25.2632 21.9895 Depression 13.7895 9.8464 19 ∑(𝑥1 − 𝑥̅ )(𝑦𝑖 − 𝑦̅) = 3828.0526 𝑖=1 a. If anxiety is the independent variable and depression is the dependent variable, what is the sample regression function? What do the estimated slope and intercept mean in context of this problem? Is the intercept meaningful? Why or why not? b. What is R-squared and what does it mean? c. If an individual had an anxiety score of 40, what is their predicted level of depression? Q.2 Consider the regression model Y = A+BX+, where Y is Total Cost of publishing a book and X is the number of pages in the book. You wish to estimate the regression based on 500 different books. A spreadsheet calculation with these 500 observations gave the following results: i = 1500 Yi = 700 iYi = 9000 i2 = 33000 Yi2 = 3600 a. b. c. d. Determine and interpret the least square regression equation. Find the following: ESS, TSS, USS Find the estimate of the standard error of the regression (Se). Find and interpret the R2. What are the units of measurement for the R2?