Homework 10

advertisement
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Homework 10
(Due Tuesday, October 6th)
Multiple Choice
1) The purpose/goal of Regression Analysis is:
a) to fit the best line to a set of data
c) to find the best values for the parameters in a model
b) to determine the relationships among variables
d) both b and c
2) What does the error term represent in a Regression Analysis?
a) the effects of variables omitted from the model
b) the effects of measurement errors
c) the effects of data recording/copying errors
d) all of the above
3) Regression Analysis can be used to investigate:
a) only relationships that are linear in the variables
c) only relationships that are non-linear in the parameters
b) linear or non-linear relationships among only two variables
d) none of the above
4) When we use the Ordinary Least Squares (OLS) method to conduct Regression Analysis, we are calculating:
̂ 0 and β̂1 ) that maximize Y
a) the model parameter values (values of β
̂ 0 and β̂1 ) that minimize the error in the model
b) the model parameter values (values of β
̂ 0 and β̂1 ) that minimize the sum of the squared errors in the model
c) the model parameter values (values of β
̂ 0 and β̂1 ) that maximize the slope of the model equation
d) the model parameter values (values of β
̂
̂ 1 ) that minimize the sum of the squared distances between the Y values in
e) the model parameter values (values of β0 and β
our data set and the graph of the model equation
f) both c and e
g) both b and d
5) What does the Standard Error of the Regression (SER) measure?
̂ 0 and β̂1 )
a) the statistical significance of model parameter values (values of β
b) the error of a forecast for Y that we make using a regression model
c) the variation/spread of the Y values around the graph of the regression equation
d) none of the above
6) After doing a regression analysis for your boss, he asks you: “So, how well does your regression equation fit the data?"
Which of the following could be part of your reply?
a) the SER of the final model was higher than the SER of the other models
b) the coefficient of determination of the final model was 0.72
c) the SER of the final model was lower than the SER of the other models
d) r2 of the final model was 0.72
e) a, b and d are correct
f) b, c and d are correct
7) In SAS, which procedure does one use to conduct regression analysis?
a) proc ols
b) proc reg
c) proc corr
d) proc contents
8) The Gauss-Markov Theorem states:
If the assumptions of the Ordinary Least Squares (OLS) method are met, then . . .
a) for a sufficiently large sample size, the distribution of Xbar will be normal
̂ 0 and β̂1 ) will be unbiased (correct on average)
b) the OLS parameter estimates (β
̂ 0 and β̂1 ) will have lower variance (error) than the estimates that could be produced using
c) the OLS parameter estimates (β
any other linear estimator
d) both b and c
1
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
True/False, Calculation and Short Answer Section
9) True or False. Which of the following assumptions is/are behind the Ordinary Least Squares (OLS) method:
_____ in the population, the relationship between the variables is linear in the variables
_____ in the population, the relationship between the variables is linear in the parameters
_____ the X variables are correlated with the population error term e
_____ the model is correctly specified (contains the correct variables, and only the correct variables)
_____ the distribution of the error term is normal
_____ the error terms for various individuals in the population are not correlated with one another
_____ the variance of the error term is the same for all individuals in the population
10) When conducting an OLS regression analysis, several steps are required. Put the following steps in the correct order by
numbering them in the blanks provided. The first step should be numbered “1,” the second step “2,” etc.
_____ Calculate the “Goodness of Fit” measures SER and R2.
_____ Describe the sign and magnitude of the effect of each (statistically significant) X on Y.
_____ Use the OLS Regression Estimator Equations to estimate the 𝛽̂ ’s for the regression model.
_____ Check the ttest numbers for the 𝛽̂ ’s to determine which of the 𝛽̂ ’s are statistically significant.
_____ Use the F-test to determine whether the regression as a whole is statistically significant.
11) The graph below shows the components of RSS, ESS and TSS for the individual in the sample with X and Y values given
by data point (Xi,Yi). (Many other data points are not shown. Each of the other data points would also have components of
RSS, ESS and TSS similar to those shown below for data point (X i,Yi). ) Properly label the indicated distances on the graph
with “ESS,” “RSS” or “TSS.”
Components of TSS, RSS and ESS
Y
𝑌̂ = β̂0 + β̂1 ∙ 𝑋1
Yi
_______________
𝑌̂
_____________
_____________
𝑌
Xi
X1
12) The graph below shows RSS, ESS and TSS for another data point. Properly label the indicated distances on the graph
with “ESS,” “RSS” or “TSS.”
Components of TSS, RSS and ESS
Y
𝑌̂ = β̂0 + β̂1 ∙ 𝑋1
𝑌
_______________
_____________
𝑌̂
Yi
Xi
_______________
X1
2
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
13) The results of an OLS regression analysis include: n = 62, k = 8, RSS = 700, ESS = 300, and TSS = 1000.
 Calculate Ftest.
 What is the hypothesis that is tested using Ftest?
 Is the regression equation (as a whole) significant at the alpha = 5% level of significance?
 Calculate SER (notice that ESS is the same as ∑i(ê2i ) ).
 Calculate R2.
 Calculate Rbar2.
 Which should be used for this particular regression, R2 or Rbar2?
 What does R2 (Rbar2) tell us?
14) (NOTE: Use the information in the handout ”Regression Analysis in SAS” on the course website to help answer the
questions in this problem.) Suppose you do some consulting work for a client who is interested in health care in North
Carolina. One determinant of the quality of health care is access to primary care doctors. The client is interested in what
determines the number of primary care doctors per 1000 persons (DocsPer1000) in North Carolina counties. The client
wants to know whether the number of doctors is influenced by measures of health needs, such as the number of babies per
1000 persons (BabiesPer1000) and senior citizens per 1000 persons (SeniorsPer1000), or simply the wealth of the population,
as measured by median family income in 1000’s of dollars (MedInc1000s). You decide to run the following OLS regression
analysis in SAS:
proc reg data=dataset01;
model DocsPer1000 = SeniorsPer1000 BabiesPer1000 MedInc1000s;
run;
The results of the analysis are shown below.
The REG Procedure
Model: MODEL1
Dependent Variable: DocsPer1000
Number of Observations Read
Number of Observations Used
100
100
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square
Model
Error
Corrected Total
3
96
99
3.28168
12.63374
15.91542
1.09389
0.13160
Root MSE
Dependent Mean
Coeff Var
0.36277
0.68325
53.09434
R-Square
Adj R-Sq
F Value
Pr > F
8.31
<.0001
0.2062
0.1814
Parameter Estimates
Variable
Intercept
SeniorsPer1000
BabiesPer1000
MedInc1000s
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
1
1.67145
-0.00371
-0.01842
0.01653
0.69101
0.00169
0.00578
0.00606
2.42
-2.19
-3.19
2.73
0.0175
0.0307
0.0019
0.0076
3
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
14 continued) Referring to the OLS regression analysis results on the previous page, answer the following
questions, assuming α = 0.05:
What was the dependent variable in the regression analysis, and what were the independent variables?
What was the sample size (n) used in the analysis?
What does the F-Value tell you about the regression model?
What are the values of d.f.numerator and d.f.denominator for use in finding Fcritical from the F-table?
What is the value of Fcritical from the F-table (using α = 0.05)?
Is the F-Value significant? Briefly, how can you tell?
What do R-Square and Adj R-Sq tell you? For this regression, which should you look at, and (very briefly) why?
In SAS, the SER is called “Root MSE?” Briefly, what does it tell us?
Briefly, what do the numbers in the “Parameter Estimate” column tell us?
Briefly, what do the numbers given in the “t Value” column tell us?
Briefly, what do the numbers in the “Pr > |t|” column tell us?
So, what is the effect of SeniorsPer1000 on the number of primary care doctors per 1000 persons in a county?
So, what is the effect of BabiesPer1000 on the number of primary care doctors per 1000 persons in a county?
So, what is the effect of MedInc1000s on the number of primary care doctors per 1000 persons in a county?
4
Download