(i) The t-statistic is calculated by dividing

advertisement
PROBLEM SET #3
Q.1
(i)
The t-statistic is calculated by dividing
a. the OLS estimator by its standard error.
b. the slope by the standard deviation of the explanatory variable.
c. the estimator minus its hypothesized value by the standard error of the
estimator.
d. the slope by 1.96.
(ii)
If the absolute value of your calculated t-statistic exceeds the critical value from the
standard normal distribution, you can
a. reject the null hypothesis.
b. safely assume that your regression results are significant.
c. reject the assumption that the error terms are homoskedastic.
d. conclude that most of the actual values are very close to the regression line.
(iii)
Consider the following regression line: TestScore  698.9  2.28  STR . You are told that
the t-statistic on the slope coefficient is 4.38. What is the standard error of the slope
coefficient?
a. 0.52
b. 1.96
c. -1.96
d. 4.38
(iv)
One of the following steps is not required as a step to test for the null hypothesis:
a.
b.
c.
d.
(v)
(vi)
compute the standard error of  1 .
test for the errors to be normally distributed.
compute the t-statistic.
compute the p-value
Finding a small value of the p-value (e.g. less than 5%)
a. indicates evidence in favor of the null hypothesis.
b. implies that the t-statistic is less than 1.96.
c. indicates evidence in against the null hypothesis.
d. will only happen roughly one in twenty samples
The homoskedastic normal regression assumptions are all of the following with the
exception of:
a. the errors are homoskedastic.
b. the errors are normally distributed.
c. there are no outliers.
d.
there are at least 10 observations
2. You have the result of a simple linear regression based on provincial level data and the province of
British Columbia. 51 observations were used for this regression. The variance of the estimated
error (𝑆𝑒2 ) is 2.04672
a.
What is the sum of squared residuals (RSS)
b.
Suppose the dependent variable Yi = the province’s mean income (in ‘000$) of males
who are 18 years of age or older and Xi is percent of males 18 years or older who are
high school graduates. If the estimate of 1 = 0.18, interpret this result
3.
In a simple regression and correlation analysis based on 72 observations, we find r = 0.8 and
Se=10
(a)
Find the amount of unexplained variation (Residual Sum of Squares)
(b)
Find the proportion of unexplained variation to the total variation.
(c)
Find the total variation of the dependent variable (TSS)
4. In a regression analysis using 16 observations, the explained variation is 40 out of a total variation of
60. Find the standard error of the estimated regression equation (Se)
5.
Given the following estimated value for the following regression based on 200 observations
(standard errors of estimated coefficients are in parentheses)
Pwidth = 2.5 - .77height
( .77) (.33)
Test the hypothesis that I think there is a no relationship between height and Pwidth. Use a 5%
significance level.
6
The following questions are based on the Sir Galton question on the previous problem session:
Sir Francis Galton, a cousin of James Darwin, examined the relationship between the height of
children and their parents towards the end of the 19th century. It is from this study that the
name “regression” originated. You decide to update his findings by collecting data from 110
college students, and estimate the following relationship: standard error of the coefficients are
in parentheses.
a.
b.
Studenth = 19.6 + 0.73Midparh,
R2 = 0.45,
Se = 2.0
(7.2)
(0.1)
Test for the statistical significance of the slope coefficient. Use 5% significance level
If children, on average, were expected to be of the same height as their parents (direct
relationship between the variables), then this would imply two hypotheses, one for the
slope and one for the intercept.
(i)
c.
What should the null hypothesis be for the intercept? Calculate the relevant tstatistic and carry out the hypothesis test at the 5% level.
(ii)
What should the null hypothesis be for the slope? Calculate the relevant tstatistic and carry out the hypothesis test at the 5% level.
Construct a 95% confidence interval for a one inch increase in the average of parental height
Q.7
SCENARIO: Suppose you are interested in learning about the determinants of college GPA for VIU
students. Being aware of the common factors, you wish to determine the effect of skipping classes on
GPA. You have collected a random sample of 141 VIU students and have recorded their ages (Agei in
years), College GPA (colgpai), high school GPA (hsgpa), provincial achievement exam score (Prov),
average lectures missed per week (Skipped) and the average number of days per week of alcohol
consumption (Alcohol)
The statistical analyses you perform are given in the computer output below:
Across these 141 individuals, what is the mean college GPA? ______
What is the highest observed age across these 141 people? ________
What is the standard deviation in average number of days per week consumption of alcohol across
the sample? ________
Do the descriptive statistics you have just provided refer to the joint distribution of these three
variables, to their conditional distributions, or to their marginal distributions? ______________
What is the correlation between high school gpai and college gpa in this sample? ________
What are the units for this correlation measure? ________
Using the Descriptive Statistics only, test the hypothesis that the true marginal mean number of
average lectures missed per week across all students is less than 1. Use a one-sided test
Based on the entire output,
a. What is the verbal interpretation of the slope of the estimated regression equation? What is the
interpretation of the intercept? Is the estimate of the intercept meaningful? Why or why not?
b. Based on the Regression output, what GPA is expected for a student who skips 2 lectures per
week on average?
c. Estimate R2 for the regression and interpret your result
d. Find the standard error of the estimated regression (Se)
e. Find RSS, ESS and TSS
DESCRIPTIVE STATISTICS
AGE
Mean
20.88652
Median
21.00000
Maximum
30.00000
Minimum
19.00000
Std. Dev.
1.271064
Skewness
3.229636
Sum
2945.000
Sum Sq.
226.1844
Deviation
Observations
141
ALCOHOL
1.901064
2.000000
7.000000
0.000000
1.374701
0.981049
268.0500
264.5723
COLGPA
3.056738
3.000000
4.000000
2.200000
0.372310
0.324620
431.0000
19.40610
HSGPA
3.402128
3.400000
4.000000
2.400000
0.319926
-0.310957
479.7000
14.32936
PROV
24.15603
24.00000
33.00000
16.00000
2.844252
0.062098
3406.000
1132.567
SKIPPED
1.076241
1.000000
5.000000
0.000000
1.088882
1.234972
151.7500
165.9929
141
141
141
141
141
CORRELATION MATRIX
AGE
AGE
1.000000
ALCOHOL
-0.022005
COLGPA
-0.019504
HSGPA
-0.259368
PROV
-0.082002
SKIPPED
-0.077569
ALCOHOL
COLGPA
HSGPA
PROV
SKIPPED
1.000000
0.017187
-0.045805
0.169029
0.337670
1.000000
0.414555
0.206754
-0.261820
1.000000
0.345806
-0.089662
1.000000
0.115485
1.000000
REGRESSION OUTPUT
Dependent Variable: COLGPA
Method: Least Squares
Date: 01/18/06 Time: 11:51
Sample: 1 141
Included observations: 141
Variable
Coefficie
nt
Std. Error
t-Statistic
Prob.
C
SKIPPED
3.153084
0.089521
0.042775
0.027990
73.71302
-3.198383
0.0000
0.0017
R-squared
Adjusted R-squared
0.061849
S.E. of regression
Sum squared residuals 18.07582
Log likelihood
-55.2502
Durbin-Watson stat
1.983121
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
3.056738
0.372310
0.812061
0.853887
10.22965
0.001712
Download