Part One – Linear Regression: Prediction (Chapter 6)

advertisement
EPS 625 – INTERMEDIATE STATISTICS
CDAA NO. 5 – BLOCK ENTRY MULTIPLE REGRESSION
You are interested in better understanding who does well and who does poorly in statistics
courses. You have been provided with a set of data containing 100 randomly selected students
who have taken a college statistics course. The data contain their average performance on
statistics exams, students’ scores on math and English aptitude tests that students took in their
senior year of high school and their high school grade point average in math, English, and all
other courses. The specific research questions for this study are: “How well do high school
test scores and grade point averages (GPA) predict a students’ test performance in
statistics courses?” and “Is it necessary to have both high school test scores and grade
point averages as predictors of exams scores in statistics?”
The dataset for this assignment is on the web site, labeled CDAA 5 (MLR) – Dataset. The
dataset contains a total of 6 variables for a sample of 100 students. For this assignment, you will
use an a priori alpha level of .05 ( = .05) for all analyses. Your dependent measure will be the
students’ average percentage correct on exams in a college statistics course (statexam) and the
independent variables are as follows:
mathtest
Score on a math aptitude test taken senior year of high school
engtest
Score on an English aptitude test taken senior year of high school
math_gpa
High school GPA in math courses
eng_gpa
High school GPA in English courses
othr_gpa
High school GPA in courses other than English and math
Once you have obtained the data set, complete/answer each of the following questions. Be brief
– but be thorough in order to receive the maximum possible points for each question. Be sure to
answer all parts and sub-parts of each question. DO NOT indicate, “see attached SPSS Output”
to answer the following questions. The only exceptions are: the formula calculation for leverage
and the profile or case summary which will be a separate page.
1. First, check for multicollinearity, potential outliers, and influential data points. Conduct the
appropriate diagnostic analyses and indicate your findings below. Be sure to include what
you found, the criteria (including any applicable formulas) in which it was judged against,
and indicate whether there is a concern or not. This summary information will assist you in
writing your BONUS results section.
2. Based on your finding in Question No.1, indicate below the values that exceeded the
criteria. You DO NOT need to print (and please don’t) these output tables – simply list the
values of concern with the applicable diagnostic item. This will also serve to assist you in
creating your temporary commands for further analyses.
3. Create a “profile” or “case summary” for the identified data points by using the list command
or the case summary option with a temporary select command. Leave this with your SPSS
output – that is where I will check to see that is was done and its accuracy.
4. Using the full dataset – run the regression analysis to answer the following questions:
How well do high school test scores and grade point averages (GPA) predict a
students’ test performance in statistics courses? and Is it necessary to have both
high school test scores and grade point averages as predictors of exams scores in
statistics?
4a. In answering the first research question, indicate what proportion of the total variance in
the statistics courses performance is explained by the entire set of independent variables?
Is this proportion of explained variance significant? Indicate how you made that decision.
4b. What proportion of the total variance in the statistics courses performance is explained by
the set of control (aptitude) variables? Is this proportion of explained variance
significant? Indicate how you made that decision.
4c. In answering the second research question as to whether the set of GPA variables
explained a significant proportion of additional total variance in the dependent measure,
indicate what proportion of the total variance in the statistics courses performance is
explained by the set of grade-point average variables? Is this proportion of explained
variance significant? Indicate how you made that decision.
5. Of the five independent variables, which ones (if any) have a significant influence on the
dependent measure? Indicate how you made your determination.
6. List the independent variables based on their relative importance from greatest to least
influence. Indicate how you made your determination. Careful on the selection.
7. Now run the same analysis as above – with the removal of the potentially influential data
points – and answer the following questions.
7a. Were there any significant changes in your overall results? Indicate how you made your
determination by indicating what you compared.
7b. Based on your findings above in Question No. 7a. – would you conclude to keep the
full data set or is there a potential need to remove the influential data points?
As a reminder:

If the results are not affected by the removal of the identified data – simply make
reference to it in your write up (see in-class example).

If the results are affected by the removal of the identified data – create a separate
paragraph indicating the changes/differences.

Report the results from the full data set in your results section – then reference
your reader to an appendix for the subsequent results created by the removal of
the identified data. It ultimately becomes a judgment call as to which set of results
will be used (a strong and valid rationale for the data’s removal is warranted).
CDAA NO. 5 - MLR
PAGE 2
Turn in all of your syntax (separately if not appended to your SPSS output). Remember – for this
assignment you will primarily be using the drop down menus – however, you will have to make
some adjustments to the syntax to correctly answer the above questions. Refer to your in-class
example to guide you.
Turn in all of your output that assisted you in answering the above questions. This includes the
initial data check, regression with (including descriptive statistics) and without identified
variables, and the identified variables profile (list). My suggestion would be to use landscape (as
opposed to portrait) to save paper.
(BONUS – up to 5 points) Write a results section for this analysis. Your result section should be
similar to the in-class example, including the two tables and the complete narrative. Be sure
to use the example as a guide so as to not loose points. Double-check your writing and
numbers before turning in this assignment.
CDAA NO. 5 - MLR
PAGE 3
Table 1
Means, Standard Deviations, and Correlations for Regression of Statistics Course Performance (N = 100)
1
1. Statistics Exam
2. Math Aptitude Test
3. English Aptitude Test
4. Math GPA
5. English GPA
6. Other Courses GPA
Means
Standard Deviations
2
3
4
5
6
1.000
1.000
1.000
1.000
1.000
1.000
Table 2
Results of Regression of Statistics Course Performance on Test Scores and GPA
Independent Variables
B
Model 1
Math Aptitude Test
English Aptitude Test
Model 2
Math Aptitude Test
English Aptitude Test
Math GPA
English GPA
Other Courses GPA
Note. R2 = ________ for Model 1, (p ________);
R2 = ________ for Model 2, (p ________;
Total R2 = ________, (p ________).
***p < .001

t
Download