EPS 625 – INTERMEDIATE STATISTICS CDAA NO. 5 – BLOCK ENTRY MULTIPLE REGRESSION You are interested in better understanding who does well and who does poorly in statistics courses. You have been provided with a set of data containing 100 randomly selected students who have taken a college statistics course. The data contain their average performance on statistics exams, students’ scores on math and English aptitude tests that students took in their senior year of high school and their high school grade point average in math, English, and all other courses. The specific research questions for this study are: “How well do high school test scores and grade point averages (GPA) predict a students’ test performance in statistics courses?” and “Is it necessary to have both high school test scores and grade point averages as predictors of exams scores in statistics?” The dataset for this assignment is on the web site, labeled CDAA 5 (MLR) – Dataset. The dataset contains a total of 6 variables for a sample of 100 students. For this assignment, you will use an a priori alpha level of .05 ( = .05) for all analyses. Your dependent measure will be the students’ average percentage correct on exams in a college statistics course (statexam) and the independent variables are as follows: mathtest Score on a math aptitude test taken senior year of high school engtest Score on an English aptitude test taken senior year of high school math_gpa High school GPA in math courses eng_gpa High school GPA in English courses othr_gpa High school GPA in courses other than English and math Once you have obtained the data set, complete/answer each of the following questions. Be brief – but be thorough in order to receive the maximum possible points for each question. Be sure to answer all parts and sub-parts of each question. DO NOT indicate, “see attached SPSS Output” to answer the following questions. The only exceptions are: the formula calculation for leverage and the profile or case summary which will be a separate page. 1. First, check for multicollinearity, potential outliers, and influential data points. Conduct the appropriate diagnostic analyses and indicate your findings below. Be sure to include what you found, the criteria (including any applicable formulas) in which it was judged against, and indicate whether there is a concern or not. This summary information will assist you in writing your BONUS results section. 2. Based on your finding in Question No.1, indicate below the values that exceeded the criteria. You DO NOT need to print (and please don’t) these output tables – simply list the values of concern with the applicable diagnostic item. This will also serve to assist you in creating your temporary commands for further analyses. 3. Create a “profile” or “case summary” for the identified data points by using the list command or the case summary option with a temporary select command. Leave this with your SPSS output – that is where I will check to see that is was done and its accuracy. 4. Using the full dataset – run the regression analysis to answer the following questions: How well do high school test scores and grade point averages (GPA) predict a students’ test performance in statistics courses? and Is it necessary to have both high school test scores and grade point averages as predictors of exams scores in statistics? 4a. In answering the first research question, indicate what proportion of the total variance in the statistics courses performance is explained by the entire set of independent variables? Is this proportion of explained variance significant? Indicate how you made that decision. 4b. What proportion of the total variance in the statistics courses performance is explained by the set of control (aptitude) variables? Is this proportion of explained variance significant? Indicate how you made that decision. 4c. In answering the second research question as to whether the set of GPA variables explained a significant proportion of additional total variance in the dependent measure, indicate what proportion of the total variance in the statistics courses performance is explained by the set of grade-point average variables? Is this proportion of explained variance significant? Indicate how you made that decision. 5. Of the five independent variables, which ones (if any) have a significant influence on the dependent measure? Indicate how you made your determination. 6. List the independent variables based on their relative importance from greatest to least influence. Indicate how you made your determination. Careful on the selection. 7. Now run the same analysis as above – with the removal of the potentially influential data points – and answer the following questions. 7a. Were there any significant changes in your overall results? Indicate how you made your determination by indicating what you compared. 7b. Based on your findings above in Question No. 7a. – would you conclude to keep the full data set or is there a potential need to remove the influential data points? As a reminder: If the results are not affected by the removal of the identified data – simply make reference to it in your write up (see in-class example). If the results are affected by the removal of the identified data – create a separate paragraph indicating the changes/differences. Report the results from the full data set in your results section – then reference your reader to an appendix for the subsequent results created by the removal of the identified data. It ultimately becomes a judgment call as to which set of results will be used (a strong and valid rationale for the data’s removal is warranted). CDAA NO. 5 - MLR PAGE 2 Turn in all of your syntax (separately if not appended to your SPSS output). Remember – for this assignment you will primarily be using the drop down menus – however, you will have to make some adjustments to the syntax to correctly answer the above questions. Refer to your in-class example to guide you. Turn in all of your output that assisted you in answering the above questions. This includes the initial data check, regression with (including descriptive statistics) and without identified variables, and the identified variables profile (list). My suggestion would be to use landscape (as opposed to portrait) to save paper. (BONUS – up to 5 points) Write a results section for this analysis. Your result section should be similar to the in-class example, including the two tables and the complete narrative. Be sure to use the example as a guide so as to not loose points. Double-check your writing and numbers before turning in this assignment. CDAA NO. 5 - MLR PAGE 3 Table 1 Means, Standard Deviations, and Correlations for Regression of Statistics Course Performance (N = 100) 1 1. Statistics Exam 2. Math Aptitude Test 3. English Aptitude Test 4. Math GPA 5. English GPA 6. Other Courses GPA Means Standard Deviations 2 3 4 5 6 1.000 1.000 1.000 1.000 1.000 1.000 Table 2 Results of Regression of Statistics Course Performance on Test Scores and GPA Independent Variables B Model 1 Math Aptitude Test English Aptitude Test Model 2 Math Aptitude Test English Aptitude Test Math GPA English GPA Other Courses GPA Note. R2 = ________ for Model 1, (p ________); R2 = ________ for Model 2, (p ________; Total R2 = ________, (p ________). ***p < .001 t