Homework 5 Covers Chapters 9, 10 and 14 Use the High School and Beyond (HSB) data set. The data is explained in the HSB Read Me file. USE MATH AS RESPONSE 1. With Math as the response and the remaining variables as predictors (excluding ID as that serves only as an identifier), how many models are possible (assume an intercept for all models)? 2. Using Math as response, analyze the data using Minitab Backward, Forward, and Stepwise Regression (keep default settings). Specify the “best” regression equation identified by these three methods. How many steps did it take for each method? Do they agree? Backward: Steps: Forward: Steps: Stepwise: Steps: Agree? 3. In the Backward Elimination analysis, which variable was removed first and why? 4. In the Forward and Stepwise analyses which variable entered first and why? 5. In the Backward Elimination analysis how much of a change in R2 is there between the model in Step 4 and the final model? 6. Now regress Math on all of the predictors and use the Best Subsets in Minitab to determine the variables that comprise the best model using R-squared, adjusted Rsquared, and lowest Cp. What are the variables, criterion values, and are the models the same? [Remember that goal is reduce number of variables from full model.] Lowest Cp: Value: 1 R-squared: Value: Adj. R-Squared: Value: All models the same? 7. Regress Math on Reading, Writing and Science. Click Storage and select Cook’s Distance (Di). Determine if any of these Di value(s) indicate if any observation(s) as influential by seeing if any of these Di values exceed 0.5 of the F-distribution with p and n-p degrees of freedom. That is, find the cumulative F probability for this column of Di values. If any cumulative probabilities exceed 0.5 then that observation would be considered and outlier. Also, in the output under Unusual Observations any observation marked with an “X” indicates and influential outlier. Do any exist in this regression analysis? DF: Number of Di values greater than 0.5: Observations that are considered influential outliers: 8. When I was younger, female students were believed to have better writing skills than male students. Using SEX as a binary response and WRTG as a predictor, answer the following questions: a) What is the slope estimate and interpretation when using the Logit link function? Slope: Interpretation: b) What is the slope estimate and interpretation when using the Probit link function? Slope: Interpretation: c) Using the Logit link, calculate BY HAND the probability that a student is female if they have a writing score of 60 and provide an interpretation of this result. 2 d) What is the interpretation of the odds ratio? e) Provide an explanation and value to the predictive ability of the model. f) Female students were also believed to have better reading skills than male students. Run a multiple logistic regression (logit link) model that includes the variable RDG. Comparing the log-likelihood of the two models, can we conclude the model with both reading and writing is a statistically better model than the model only containing writing (use a 5% level of significance)? Include the chi-square test statistic, degrees of freedom, p-value (from Minitab), and conclusion. Test Statistic: DF: P-value Conclusion: 3