UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Homework 10 (Due Tuesday, October 6th) Multiple Choice 1) The purpose/goal of Regression Analysis is: a) to fit the best line to a set of data c) to find the best values for the parameters in a model b) to determine the relationships among variables d) both b and c 2) What does the error term represent in a Regression Analysis? a) the effects of variables omitted from the model b) the effects of measurement errors c) the effects of data recording/copying errors d) all of the above 3) Regression Analysis can be used to investigate: a) only relationships that are linear in the variables c) only relationships that are non-linear in the parameters b) linear or non-linear relationships among only two variables d) none of the above 4) When we use the Ordinary Least Squares (OLS) method to conduct Regression Analysis, we are calculating: ̂ 0 and β̂1 ) that maximize Y a) the model parameter values (values of β ̂ 0 and β̂1 ) that minimize the error in the model b) the model parameter values (values of β ̂ 0 and β̂1 ) that minimize the sum of the squared errors in the model c) the model parameter values (values of β ̂ 0 and β̂1 ) that maximize the slope of the model equation d) the model parameter values (values of β ̂ ̂ 1 ) that minimize the sum of the squared distances between the Y values in e) the model parameter values (values of β0 and β our data set and the graph of the model equation f) both c and e g) both b and d 5) What does the Standard Error of the Regression (SER) measure? ̂ 0 and β̂1 ) a) the statistical significance of model parameter values (values of β b) the error of a forecast for Y that we make using a regression model c) the variation/spread of the Y values around the graph of the regression equation d) none of the above 6) After doing a regression analysis for your boss, he asks you: “So, how well does your regression equation fit the data?" Which of the following could be part of your reply? a) the SER of the final model was higher than the SER of the other models b) the coefficient of determination of the final model was 0.72 c) the SER of the final model was lower than the SER of the other models d) r2 of the final model was 0.72 e) a, b and d are correct f) b, c and d are correct 7) In SAS, which procedure does one use to conduct regression analysis? a) proc ols b) proc reg c) proc corr d) proc contents 8) The Gauss-Markov Theorem states: If the assumptions of the Ordinary Least Squares (OLS) method are met, then . . . a) for a sufficiently large sample size, the distribution of Xbar will be normal ̂ 0 and β̂1 ) will be unbiased (correct on average) b) the OLS parameter estimates (β ̂ 0 and β̂1 ) will have lower variance (error) than the estimates that could be produced using c) the OLS parameter estimates (β any other linear estimator d) both b and c 1 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas True/False, Calculation and Short Answer Section 9) True or False. Which of the following assumptions is/are behind the Ordinary Least Squares (OLS) method: _____ in the population, the relationship between the variables is linear in the variables _____ in the population, the relationship between the variables is linear in the parameters _____ the X variables are correlated with the population error term e _____ the model is correctly specified (contains the correct variables, and only the correct variables) _____ the distribution of the error term is normal _____ the error terms for various individuals in the population are not correlated with one another _____ the variance of the error term is the same for all individuals in the population 10) When conducting an OLS regression analysis, several steps are required. Put the following steps in the correct order by numbering them in the blanks provided. The first step should be numbered “1,” the second step “2,” etc. _____ Calculate the “Goodness of Fit” measures SER and R2. _____ Describe the sign and magnitude of the effect of each (statistically significant) X on Y. _____ Use the OLS Regression Estimator Equations to estimate the 𝛽̂ ’s for the regression model. _____ Check the ttest numbers for the 𝛽̂ ’s to determine which of the 𝛽̂ ’s are statistically significant. _____ Use the F-test to determine whether the regression as a whole is statistically significant. 11) The graph below shows the components of RSS, ESS and TSS for the individual in the sample with X and Y values given by data point (Xi,Yi). (Many other data points are not shown. Each of the other data points would also have components of RSS, ESS and TSS similar to those shown below for data point (X i,Yi). ) Properly label the indicated distances on the graph with “ESS,” “RSS” or “TSS.” Components of TSS, RSS and ESS Y 𝑌̂ = β̂0 + β̂1 ∙ 𝑋1 Yi _______________ 𝑌̂ _____________ _____________ 𝑌 Xi X1 12) The graph below shows RSS, ESS and TSS for another data point. Properly label the indicated distances on the graph with “ESS,” “RSS” or “TSS.” Components of TSS, RSS and ESS Y 𝑌̂ = β̂0 + β̂1 ∙ 𝑋1 𝑌 _______________ _____________ 𝑌̂ Yi Xi _______________ X1 2 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas 13) The results of an OLS regression analysis include: n = 62, k = 8, RSS = 700, ESS = 300, and TSS = 1000. Calculate Ftest. What is the hypothesis that is tested using Ftest? Is the regression equation (as a whole) significant at the alpha = 5% level of significance? Calculate SER (notice that ESS is the same as ∑i(ê2i ) ). Calculate R2. Calculate Rbar2. Which should be used for this particular regression, R2 or Rbar2? What does R2 (Rbar2) tell us? 14) (NOTE: Use the information in the handout ”Regression Analysis in SAS” on the course website to help answer the questions in this problem.) Suppose you do some consulting work for a client who is interested in health care in North Carolina. One determinant of the quality of health care is access to primary care doctors. The client is interested in what determines the number of primary care doctors per 1000 persons (DocsPer1000) in North Carolina counties. The client wants to know whether the number of doctors is influenced by measures of health needs, such as the number of babies per 1000 persons (BabiesPer1000) and senior citizens per 1000 persons (SeniorsPer1000), or simply the wealth of the population, as measured by median family income in 1000’s of dollars (MedInc1000s). You decide to run the following OLS regression analysis in SAS: proc reg data=dataset01; model DocsPer1000 = SeniorsPer1000 BabiesPer1000 MedInc1000s; run; The results of the analysis are shown below. The REG Procedure Model: MODEL1 Dependent Variable: DocsPer1000 Number of Observations Read Number of Observations Used 100 100 Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 3 96 99 3.28168 12.63374 15.91542 1.09389 0.13160 Root MSE Dependent Mean Coeff Var 0.36277 0.68325 53.09434 R-Square Adj R-Sq F Value Pr > F 8.31 <.0001 0.2062 0.1814 Parameter Estimates Variable Intercept SeniorsPer1000 BabiesPer1000 MedInc1000s DF Parameter Estimate Standard Error t Value Pr > |t| 1 1 1 1 1.67145 -0.00371 -0.01842 0.01653 0.69101 0.00169 0.00578 0.00606 2.42 -2.19 -3.19 2.73 0.0175 0.0307 0.0019 0.0076 3 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas 14 continued) Referring to the OLS regression analysis results on the previous page, answer the following questions, assuming α = 0.05: What was the dependent variable in the regression analysis, and what were the independent variables? What was the sample size (n) used in the analysis? What does the F-Value tell you about the regression model? What are the values of d.f.numerator and d.f.denominator for use in finding Fcritical from the F-table? What is the value of Fcritical from the F-table (using α = 0.05)? Is the F-Value significant? Briefly, how can you tell? What do R-Square and Adj R-Sq tell you? For this regression, which should you look at, and (very briefly) why? In SAS, the SER is called “Root MSE?” Briefly, what does it tell us? Briefly, what do the numbers in the “Parameter Estimate” column tell us? Briefly, what do the numbers given in the “t Value” column tell us? Briefly, what do the numbers in the “Pr > |t|” column tell us? So, what is the effect of SeniorsPer1000 on the number of primary care doctors per 1000 persons in a county? So, what is the effect of BabiesPer1000 on the number of primary care doctors per 1000 persons in a county? So, what is the effect of MedInc1000s on the number of primary care doctors per 1000 persons in a county? 4