EPE/EDP 660 Exam 3 Dr. Kelly Bradley {2 points} Name You MUST work alone – no tutors; no help from classmates. Email me or see me with questions. You will receive a score of 0 if this rule is violated. Minitab (or other approved software) output, session window in Minitab, must be included {2 points}. Answers must be clearly labeled. If using Minitab, the session window should be included. Do NOT include a copy of the worksheet. In order to receive partial credit, work must be shown. PART A (18 POINTS): FILL IN THE BLANK (with best choice) {2 points per blank} (1) The number of levels of a quantitative variable must be at least ______ more than the order of the polynomial x that you want to fit. (2) When 2 or more independent variables are moderately to highly correlated with each other, it is reasonable to suspect an issue with ___________________. (3) Predicting y when the x values are outside the range of experimentation is ___________________. (4) ___________________ Regression is a screening method that starts with no predictors. Each of the available predictors is evaluated with respect to how much R2 would be increased by adding it to the model. (5) The ___________________can be regarded as a random sample from a N(0,σ2) distribution, so we can check this assumption by checking whether the residuals might have come from a normal distribution. (6) To fit a straight line, you need at least______ different x values, and to fit a curve you need at least ______. (7) An observation that is larger than 2 or 3s is a/n ___________________. (8) In ___________________ regression, the β parameter is interpreted as the percentage change in odds for every 1-unit increase in xi holding all other x’s fixed. 1|Page EPE/EDP 660 Exam 3 Dr. Kelly Bradley PART B: Short Answer (23 POINTS) (1) In addition to independent or predictor variables being highly correlated, how can you assess if multicollinearity is present? Explain. {4 points} (2) Considering the regression setting, list the assumptions about ε? If assumptions do not hold, what are the potential consequences? Explain. {6 points) (3) Why would we want to use the standardized β coefficients over the regular β coefficients? Explain. {4 points} (4) What is meant by Parsimony in regards to regression models? Are there times when it is good and bad? Explain. {3 points} (5) What is the difference between homoscedastic and heteroscedastic and which is preferable? {3 points} (6) How is logistic regression unique in terms of the dependent variable? In what way/s is this helpful? {3 points} 2|Page EPE/EDP 660 Exam 3 Dr. Kelly Bradley PART C: Data Analysis (55 points) A college dean desires to estimate students’ GPA after their first semester. The dean takes a random sample of 94 freshmen currently enrolled at the college and records their ACT scores, listed by section and as a comprehensive score (ACT Composite), and High School GPA (HS GPA). The table below contains data for the sample (Only the first 6 observations are presented. Use the full data set for your analysis). *Adequate ACT (highlighted in table) is discussed and used in item (i). Term GPA HS GPA 4.00 1.83 3.79 3.44 4.00 3.31 3.93 3.37 3.92 4.00 4.00 3.19 ACT English 25 23 29 28 27 16 ACT Math 24 21 18 30 25 18 ACT ACT ACT Reading Science Composite 24 24 24 21 24 22 31 22 25 26 26 28 30 23 26 18 16 17 Adequate ACT 1 1 1 1 1 0 (1) Produce a graphical summary for the y-variable, Term GPA. Describe the general distribution of y, include discussion of central tendency and variability. {2 points} (2) Produce descriptive statistics for all potential predictor (independent variables). HS GPAACT Comp, exclude Adequate ACT. At a minimum, include mean, median, standard deviation, and range. Describe general trends, distributions, etc. {3 points} (3) Produce a correlation matrix and matrix plot of all the variables, excluding Adequate ACT. Do you see any “strong” correlations? Defend. {3 points} (4) Compute the regression equation, R-square, VIF, standardized coefficient estimates, and standardized residuals, for the regression model with all potential independent variables as predictors of Term GPA. (Include the ANOVA table). Submit your 1st 8 rows of the Minitab worksheet for this item. {5 points} i. What is the R-square and R-square (adjusted)? What does this tell us? {3 points} ii. Overall, do you feel this is a reasonable model? Defend. {3 points} 3|Page EPE/EDP 660 Exam 3 Dr. Kelly Bradley (5) Conduct a Stepwise regression analysis of the data. List the best equation. Why is it your choice? (Be sure to explain how the decision was made.) Defend. {5 points} (6) Conduct a Backward Elimination regression analysis of the data. Why is it your choice? (Be sure to explain how the decision was made.) Defend. {5 points} (7) Conduct a Best Subsets regression analysis, include PRESS, of the data. Discuss the results. {3 points} (8) Now, compute the regression equation for estimating Term GPA as a function of ACT comp and HS GPA. Include VIF, standardized coefficient estimates, and standardized residuals, along with the ANOVA table. {3 points} i. Check the Assumptions of Regression. Be sure that you have produced the standardized residuals 4 in 1 plot, or constructed the appropriate plots. {4 points} ii. Produce Hi(leverages), Cook’s Distance, and DFITS. Explore, identify, and discuss outliers and leverage points. {4 points} (9) A new variable was computed in in C8, Adequate ACT. If ACT Comp score is greater than 20 then Acceptable = 1, if not then Acceptable = 0. Produce a binary logistic regression model to predict an Adequate ACT score as a function of High School GPA {2 points}. i. Report the maximum likelihood values of the estimates. {2 points} ii. Report the odds-ratios and compute the percent increase or decrease in the estimate of odds of Adequate ACT. {2 points} iii. Test the overall adequacy of the model. Be sure to report the test statistic and p-value. {3 points} (10) From all the regression equations produced above, which do you feel is most desirable? Write the equation. Defend your choice. {3 points} 4|Page