PROBLEM SET #3 Q.1 (i) The t-statistic is calculated by dividing a. the OLS estimator by its standard error. b. the slope by the standard deviation of the explanatory variable. c. the estimator minus its hypothesized value by the standard error of the estimator. d. the slope by 1.96. (ii) If the absolute value of your calculated t-statistic exceeds the critical value from the standard normal distribution, you can a. reject the null hypothesis. b. safely assume that your regression results are significant. c. reject the assumption that the error terms are homoskedastic. d. conclude that most of the actual values are very close to the regression line. (iii) Consider the following regression line: TestScore 698.9 2.28 STR . You are told that the t-statistic on the slope coefficient is 4.38. What is the standard error of the slope coefficient? a. 0.52 b. 1.96 c. -1.96 d. 4.38 (iv) One of the following steps is not required as a step to test for the null hypothesis: a. b. c. d. (v) (vi) compute the standard error of 1 . test for the errors to be normally distributed. compute the t-statistic. compute the p-value Finding a small value of the p-value (e.g. less than 5%) a. indicates evidence in favor of the null hypothesis. b. implies that the t-statistic is less than 1.96. c. indicates evidence in against the null hypothesis. d. will only happen roughly one in twenty samples The homoskedastic normal regression assumptions are all of the following with the exception of: a. the errors are homoskedastic. b. the errors are normally distributed. c. there are no outliers. d. there are at least 10 observations 2. You have the result of a simple linear regression based on provincial level data and the province of British Columbia. 51 observations were used for this regression. The variance of the estimated error (𝑆𝑒2 ) is 2.04672 a. What is the sum of squared residuals (RSS) b. Suppose the dependent variable Yi = the province’s mean income (in ‘000$) of males who are 18 years of age or older and Xi is percent of males 18 years or older who are high school graduates. If the estimate of 1 = 0.18, interpret this result 3. In a simple regression and correlation analysis based on 72 observations, we find r = 0.8 and Se=10 (a) Find the amount of unexplained variation (Residual Sum of Squares) (b) Find the proportion of unexplained variation to the total variation. (c) Find the total variation of the dependent variable (TSS) 4. In a regression analysis using 16 observations, the explained variation is 40 out of a total variation of 60. Find the standard error of the estimated regression equation (Se) 5. Given the following estimated value for the following regression based on 200 observations (standard errors of estimated coefficients are in parentheses) Pwidth = 2.5 - .77height ( .77) (.33) Test the hypothesis that I think there is a no relationship between height and Pwidth. Use a 5% significance level. 6 The following questions are based on the Sir Galton question on the previous problem session: Sir Francis Galton, a cousin of James Darwin, examined the relationship between the height of children and their parents towards the end of the 19th century. It is from this study that the name “regression” originated. You decide to update his findings by collecting data from 110 college students, and estimate the following relationship: standard error of the coefficients are in parentheses. a. b. Studenth = 19.6 + 0.73Midparh, R2 = 0.45, Se = 2.0 (7.2) (0.1) Test for the statistical significance of the slope coefficient. Use 5% significance level If children, on average, were expected to be of the same height as their parents (direct relationship between the variables), then this would imply two hypotheses, one for the slope and one for the intercept. (i) c. What should the null hypothesis be for the intercept? Calculate the relevant tstatistic and carry out the hypothesis test at the 5% level. (ii) What should the null hypothesis be for the slope? Calculate the relevant tstatistic and carry out the hypothesis test at the 5% level. Construct a 95% confidence interval for a one inch increase in the average of parental height Q.7 SCENARIO: Suppose you are interested in learning about the determinants of college GPA for VIU students. Being aware of the common factors, you wish to determine the effect of skipping classes on GPA. You have collected a random sample of 141 VIU students and have recorded their ages (Agei in years), College GPA (colgpai), high school GPA (hsgpa), provincial achievement exam score (Prov), average lectures missed per week (Skipped) and the average number of days per week of alcohol consumption (Alcohol) The statistical analyses you perform are given in the computer output below: Across these 141 individuals, what is the mean college GPA? ______ What is the highest observed age across these 141 people? ________ What is the standard deviation in average number of days per week consumption of alcohol across the sample? ________ Do the descriptive statistics you have just provided refer to the joint distribution of these three variables, to their conditional distributions, or to their marginal distributions? ______________ What is the correlation between high school gpai and college gpa in this sample? ________ What are the units for this correlation measure? ________ Using the Descriptive Statistics only, test the hypothesis that the true marginal mean number of average lectures missed per week across all students is less than 1. Use a one-sided test Based on the entire output, a. What is the verbal interpretation of the slope of the estimated regression equation? What is the interpretation of the intercept? Is the estimate of the intercept meaningful? Why or why not? b. Based on the Regression output, what GPA is expected for a student who skips 2 lectures per week on average? c. Estimate R2 for the regression and interpret your result d. Find the standard error of the estimated regression (Se) e. Find RSS, ESS and TSS DESCRIPTIVE STATISTICS AGE Mean 20.88652 Median 21.00000 Maximum 30.00000 Minimum 19.00000 Std. Dev. 1.271064 Skewness 3.229636 Sum 2945.000 Sum Sq. 226.1844 Deviation Observations 141 ALCOHOL 1.901064 2.000000 7.000000 0.000000 1.374701 0.981049 268.0500 264.5723 COLGPA 3.056738 3.000000 4.000000 2.200000 0.372310 0.324620 431.0000 19.40610 HSGPA 3.402128 3.400000 4.000000 2.400000 0.319926 -0.310957 479.7000 14.32936 PROV 24.15603 24.00000 33.00000 16.00000 2.844252 0.062098 3406.000 1132.567 SKIPPED 1.076241 1.000000 5.000000 0.000000 1.088882 1.234972 151.7500 165.9929 141 141 141 141 141 CORRELATION MATRIX AGE AGE 1.000000 ALCOHOL -0.022005 COLGPA -0.019504 HSGPA -0.259368 PROV -0.082002 SKIPPED -0.077569 ALCOHOL COLGPA HSGPA PROV SKIPPED 1.000000 0.017187 -0.045805 0.169029 0.337670 1.000000 0.414555 0.206754 -0.261820 1.000000 0.345806 -0.089662 1.000000 0.115485 1.000000 REGRESSION OUTPUT Dependent Variable: COLGPA Method: Least Squares Date: 01/18/06 Time: 11:51 Sample: 1 141 Included observations: 141 Variable Coefficie nt Std. Error t-Statistic Prob. C SKIPPED 3.153084 0.089521 0.042775 0.027990 73.71302 -3.198383 0.0000 0.0017 R-squared Adjusted R-squared 0.061849 S.E. of regression Sum squared residuals 18.07582 Log likelihood -55.2502 Durbin-Watson stat 1.983121 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic) 3.056738 0.372310 0.812061 0.853887 10.22965 0.001712