Computer Assignment #3

advertisement
MATH 210 – Computer Assignment #3 – 25 Points
Due Thursday, 5/8/08
You will be working in the same group and using the same 100 data values you had for Assignments
#1 and 2. In this assignment you will be analyzing the relationship between a student’s GPA after three
college semesters and the variables HSM, HSE, SATM and SATV. We wish to determine which of
these four explanatory variables, measured when the students were in high school, provides the best
linear predictor for their college GPA. We will be ignoring the students’ sex in this assignment.
Use SPSS to construct scatterplots and perform linear regression analysis as described below. Make
sure to put back in any outliers you deleted for Assignment #2. Type a double-spaced report
summarizing your findings for Purdue computer science majors. Your introduction should
summarize the population being studied, the variables of interests, the statistical methods being
used and your conclusions in the context of this study. Attach all supporting SPSS output and
highlight relevant values on your output. Follow the report guidelines given with Assignment #1.
1. Construct separate scatterplots of GPA versus each of the four explanatory variables (i.e., GPA vs. HSM,
GPA vs. HSE, GPA vs. SATM, and GPA vs. SATV) including the LSR line (choose “Elements, Fit Line at
Total” from the chart editor). Describe each scatterplot in terms of direction, form and strength. Which
association appears to be the strongest? the weakest?
2. Repeat the regression analysis four times with GPA as your response (y) variable (i.e., GPA vs. HSM, GPA
vs. HSE, etc.). Be sure to save the residuals each time. Summarize the results in your report, stating the
correlation for each of the four relationships and also each of the least-squares regression (LSR) equations.
3. Construct a residual plot for each of these four relationships and describe the results. Are the plots random
(indicating the LSR line is a good fit of the data) or are any patterns present?
4. Which of the four explanatory variables (HSE, HSM, SATV or SATM) provides the best linear predictor of
a Purdue C.S. students’ GPAs? This should be determined by first eliminating any relationships that are
clearly non-linear (based on the residual plots). Of those that appear fairly linear choose the relationship with
the strongest correlation (r nearest +1 or –1). Recall that we are assuming a relationship is linear unless there
is clear curvature in the residual plot. For the rest of this assignment use this variable as your explanatory
(x) variable and ignore the other three variables.
5. For the relationship you chose in #4, are there any data points which would be considered influential
observations? Discuss your conclusion. If there are any points that seem especially influential remove them
from the data, make a new scatterplot and residual plot and report the new LSR equation and correlation.
6. What does the slope of the LSR line tell you about how students’ GPAs tend to vary?
7. What percent of variation in students’ GPA can be accounted for by variation in x? Explain!
8. Use the regression output to test whether the regression slope, 1, is equal to zero. What does your
conclusion say about the relationship between GPA and x in the entire population of interest? Write down
the null and alternative hypotheses, report the P-value (found in SPSS) and explain your conclusion.
9. Before constructing confidence and prediction intervals check that the residuals are approximately normal.
Interpret the appropriate SPSS output and explain your conclusion.
10. Add the confidence and prediction limit curves to your scatterplot (Use “Elements, Fit Line at Total,
Confidence Intervals” with the default of 95% confidence). Choose any x value that is contained in your
data set. What is the predicted y value (i.e., GPA)? What is the 95% prediction interval for GPA when x
takes on the chosen value? What is the 95% confidence interval for the mean value of GPA when x takes on
this value? Interpret the meaning of these intervals in the context of the study.
Download