finding R

advertisement
REGRESSION TIPS
FALL, 2004
HOFFMAN
2
Two different ways to arrive at the value R “percent of variation in the Y variable that can be explained by the
X variable”….for example: the percent of variation in grades on exam 1 that can be explained by the amount
of time the student took to complete the exam. Note: column references below refer to our regression
worksheet.
1. Calculate r (the correlation coefficient) in the usual manner by
a) finding the z-scores for X (column 3)
b) and the z-scores for Y (column 4)
c) then finding the product of those columns (column 5)
d) and adding up all those numbers in column 5
e) then dividing that total by the (number of pairs –1)
f) Use r*r * 100 % as the percent of variation explained.
2.
a) Feed the X-values (column 1) and their corresponding Y-values (column 2) to either DDXL or
another statistics package to find the regression line.
b) Once we have the regression line we can plug each X-value (column 1) into the line to get our
predicted values (column 6).
c) We generate our errors (residuals, mistakes made because we are using a too simple of a
regression model) by subtracting the predictions (column 6) from the actual data (column 2)
and place these in an error column (column 7).
d) Multiply the errors by the errors (column 7 * column 7)
e) Add all those errors-squared together. Call this SSE (sum of squared errors)
f) Use DDXL or another statistics package to find the variance of your Y-values and multiply that
number by (number of pairs –1) so that you have SSY (sum of squares for Y).
g) Calculate (1 – (SSE/SSY))*100 % as the percent of variation explained.
Download