hw_05

advertisement
Homework 5
Covers Chapters 9, 10 and 14
Use the High School and Beyond (HSB) data set.
The data is explained in the HSB Read Me file.
USE MATH AS RESPONSE
1. With Math as the response and the remaining variables as predictors (excluding ID as
that serves only as an identifier), how many models are possible (assume an intercept for
all models)?
2. Using Math as response, analyze the data using Minitab Backward, Forward, and
Stepwise Regression (keep default settings). Specify the “best” regression equation
identified by these three methods. How many steps did it take for each method? Do they
agree?
Backward:
Steps:
Forward:
Steps:
Stepwise:
Steps:
Agree?
3. In the Backward Elimination analysis, which variable was removed first and why?
4. In the Forward and Stepwise analyses which variable entered first and why?
5. In the Backward Elimination analysis how much of a change in R2 is there between the
model in Step 4 and the final model?
6. Now regress Math on all of the predictors and use the Best Subsets in Minitab to
determine the variables that comprise the best model using R-squared, adjusted Rsquared, and lowest Cp. What are the variables, criterion values, and are the models the
same? [Remember that goal is reduce number of variables from full model.]
Lowest Cp:
Value:
1
R-squared:
Value:
Adj. R-Squared:
Value:
All models the same?
7. Regress Math on Reading, Writing and Science. Click Storage and select Cook’s
Distance (Di). Determine if any of these Di value(s) indicate if any observation(s) as
influential by seeing if any of these Di values exceed 0.5 of the F-distribution with p and
n-p degrees of freedom. That is, find the cumulative F probability for this column of Di
values. If any cumulative probabilities exceed 0.5 then that observation would be
considered and outlier. Also, in the output under Unusual Observations any observation
marked with an “X” indicates and influential outlier. Do any exist in this regression
analysis?
DF:
Number of Di values greater than 0.5:
Observations that are considered influential outliers:
8. When I was younger, female students were believed to have better writing skills than
male students. Using SEX as a binary response and WRTG as a predictor, answer the
following questions:
a) What is the slope estimate and interpretation when using the Logit link function?
Slope:
Interpretation:
b) What is the slope estimate and interpretation when using the Probit link function?
Slope:
Interpretation:
c) Using the Logit link, calculate BY HAND the probability that a student is female if
they have a writing score of 60 and provide an interpretation of this result.
2
d) What is the interpretation of the odds ratio?
e) Provide an explanation and value to the predictive ability of the model.
f) Female students were also believed to have better reading skills than male students.
Run a multiple logistic regression (logit link) model that includes the variable RDG.
Comparing the log-likelihood of the two models, can we conclude the model with both
reading and writing is a statistically better model than the model only containing writing
(use a 5% level of significance)? Include the chi-square test statistic, degrees of freedom,
p-value (from Minitab), and conclusion.
Test Statistic:
DF:
P-value
Conclusion:
3
Download