Social Science Reasoning Using Statistics

advertisement
Reasoning in Psychology
Using Statistics
Psychology 138
2015
• Quiz 8 due Fri. Apr 17
– Includes both correlation and regression
• Final Project due date Wed. April 29th (you should
get your cases assigned to you in labs today)
Announcements
Reasoning in Psychology
Using Statistics
• Lecture Exam 3
– Mean 53.6 (53.6/75 = 71.4%)
• Combined Exam 3
– Mean 116.1 (116.1/150 = 77.4%)
Exam(s) 3
Reasoning in Psychology
Using Statistics
• Lab Exam 3
– Mean 61.0 (61.0/75 = 81.3%)
• Regression procedures can be used to predict the response
variable based on the explanatory variable(s)
Suppose that you notice that the more you study for an exam,
the better your score typically is.
– This suggests that there is a relationship between the variables.
– You can use this relationship to predict test performance base on
study time.
study time
115 mins
15 mins
Regression
Reasoning in Psychology
Using Statistics
test performance
• Regression
– Describing the nature of the relationship between
variables for the purposes of prediction
Two variables
Relationship between
variables
Quantitative
variables
Decision tree
Reasoning in Psychology
Using Statistics
Making predictions
based on form of
the relationship
• For correlation: “it doesn’t matter which variable goes
on the X-axis or the Y-axis”
Predicted
variable
• For regression this is NOT
the case
Y
6
5
– The variable that you are
predicting (response variable)
goes on the Y-axis
Quiz
4
performance
3
2
1
1
2
Hours of
study
3
4
5
Predictor
variable
Regression
Reasoning in Psychology
Using Statistics
6 X
– The variable that you are
making the prediction based on
(explanatory variable) goes on
the X-axis
• For correlation: “Imagine a line through the points”
• But there are lots of possible
lines
Y
6
5
Quiz
4
performance
3
2
1
1
2
3
4
5
Hours of
study
Regression
Reasoning in Psychology
Using Statistics
6
• One line is the “best fitting
line”
• Today: learn how to compute
the equation corresponding to
this “best fitting line”
X
• A brief review of geometry
Y = intercept,
when X = 0
Y
6
5
Y = (X)(slope) + (intercept)
4
3
2
1
0
2.0
1
2
3
4
5
Regression
Reasoning in Psychology
Using Statistics
6 X
• A brief review of geometry
Y
6
5
Y = (X)(slope) + (intercept)
4
3
2
1
0
1
0.5
2
1
2
3
Change in Y
4
5
Regression
Reasoning in Psychology
Using Statistics
6 X
Change in X
2.0
= slope
• A brief review of geometry
Y
6
5
Y = (X)(slope) + (intercept)
Y = (X)(0.5) + 2.0
4
3
2
1
0
1
2
3
4
5
Regression
Reasoning in Psychology
Using Statistics
6 X
– In regression analysis this
line (or the equation that
describes it) represents our
predicted values of Y given
particular values of X
• A brief review of geometry
• Consider a perfect correlation
X=5
4.5
Y
6
5
Y = (X)(0.5) + (2.0)
Y=?
Y = (5)(0.5) + (2.0)
4
3
2
1
1
2
3
4
5
Regression
Reasoning in Psychology
Using Statistics
6 X
Y = 2.5 + 2 = 4.5
• Can make specific
predictions about Y
based on X
• Consider a less than perfect correlation
• The line still represents the
predicted values of Y given X
X=5
4.5
Y
6
5
Y = (X)(0.5) + (2.0)
Y=?
Y = (5)(0.5) + (2.0)
4
3
2
1
Y = 2.5 + 2 = 4.5
1
2
3
4
5
Regression
Reasoning in Psychology
Using Statistics
6 X
• The “best fitting line” is the one that minimizes the
differences (error or residuals) between the predicted
scores (the line) and the actual scores (the points)
Y
6
5
• Rather than compare the
errors from different lines
and picking the best, we
will directly compute the
equation for the best
fitting line
4
3
2
1
1
2
3
4
5
Regression
Reasoning in Psychology
Using Statistics
6 X
• Using the dataset from our correlation lecture
Suppose that you notice that the more you study for an exam (X= hours of
study), the better your exam score typically is (Y = exam score).
Compute the regression equation predicting exam score with study
time.
X Y
A 6
6
B
1
2
C
5
6
D 3
4
E 3
2
Example
Reasoning in Psychology
Using Statistics
Y
6
5
4
3
2
1
1
2
3
4
5
6
X
Suppose that you notice that the more you study for an exam (X= hours of
study), the better your exam score typically is (Y = exam score).
Compute the regression equation predicting exam score with study
time.
X Y
( X - X ) ( X - X ) (Y - Y ) (Y - Y ) ( X - X )(Y -Y )
A 6
6
2.4
B
1
2
C
5
D 3
4.8
-2.0
5.2
1.96
2.0
4.0
2.8
-0.6
0.36
0.0
0.0
0.0
-0.6
0.0
0.36
15.20
-2.0
SSX 0.0
4.0
16.0
1.2
14.0
-2.6
6
1.4
4
Example
Reasoning in Psychology
Using Statistics
2.0
4.0
4.0
5.76
6.76
E 3 2
mean 3.6 4.0
2
2
SSY
SP
Suppose that you notice that the more you study for an exam (X= hours of
study), the better your exam score typically is (Y = exam score).
Compute the regression equation predicting exam score with study
time.
X Y
A 6
6
B
1
2
C
5
6
D 3
4
E 3 2
mean 3.6 4.0
SP
14
slope = b =
=
= 0.92
SSX
15.2
15.20
Example
Reasoning in Psychology
Using Statistics
SSX
16.0
SSY
14.0
SP
Suppose that you notice that the more you study for an exam (X= hours of
study), the better your exam score typically is (Y = exam score).
Compute the regression equation predicting exam score with study
time.
X Y
A 6
6
B
1
2
C
5
6
D 3
4
E 3 2
mean 3.6 4.0
SP
14
slope = b =
=
= 0.92
SSX
15.2
intercept = a = Y - bX
= 4.0 - (0.92)(3.6)
= 0.688
15.20
Example
Reasoning in Psychology
Using Statistics
SSX
16.0
SSY
14.0
SP
Suppose that you notice that the more you study for an exam (X= hours of
study), the better your exam score typically is (Y = exam score).
Compute the regression equation predicting exam score with study
time.
X Y
A 6
6
B
1
2
C
5
6
D 3
4
E 3 2
mean 3.6 4.0
Y = 0.92X + 0.688
Y
6
5
4
3
2
1
slope = b = 0.92
intercept = 0.688
1
Example
Reasoning in Psychology
Using Statistics
2
3
4
5
6
X
Suppose that you notice that the more you study for an exam (X= hours of
study), the better your exam score typically is (Y = exam score).
Compute the regression equation predicting exam score with study
time.
X Y
A 6
6
B
1
2
C
5
6
D 3
4
E 3 2
mean 3.6 4.0
Y
The two means
will be on the line
6
5
4
3
2
1
Y
slope = b = 0.92
intercept = 0.688
1
Example
Reasoning in Psychology
Using Statistics
Y = 0.92X + 0.688
X
2
3
4
5
6
X
Suppose that you notice that the more you study for an exam (X= hours of
study), the better your exam score typically is (Y = exam score).
Compute the regression equation predicting exam score with study
time.
X Y
A 6
6
B
1
2
C
5
6
D 3
4
E 3 2
mean 3.6 4.0
Y = 0.92X + 0.688
Y
6
5
4
3
Hypothesis
testing
2 of these
on each
1
1
Example
Reasoning in Psychology
Using Statistics
2
slope = b = 0.92
intercept = 0.688
3
4
5
6
X
• SPSS Regression output gives
you a lot of stuff
Hypothesis testing with Regression
Reasoning in Psychology
Using Statistics
• SPSS Regression output gives
you a lot of stuff
Make sure you put the
variables in the correct role
Hypothesis testing with Regression
Reasoning in Psychology
Using Statistics
• SPSS Regression output gives
you a lot of stuff
• Unstandardized coefficients
– “(Constant)” = intercept
– Variable name = slope
• These t-tests test hypotheses
– H0: Intercept (constant) = 0
– H0: Slope = 0
Hypothesis testing with Regression
Reasoning in Psychology
Using Statistics
• The linear equation isn’t the whole thing
• Also need a measure of error
Y = X(.5) + (2.0) + error
Y = X(.5) + (2.0) + error
• Same line, but different relationships (strength difference)
Y
6
5
Y
6
5
4
3
2
1
4
3
2
1
1
2
3
4
5
6 X
1
2
3
4
5
6 X
Measures of Error in Regression
Reasoning in Psychology
Using Statistics
• The linear equation isn’t the whole thing
• Also need a measure of error
• Three common measures of error
– r2 (r-squared)
– Sum of the squared residuals = SSresidual= SSerror
– Standard error of estimate
Measures of Error in Regression
Reasoning in Psychology
Using Statistics
• R-squared (r2) represents the percent variance in Y
accounted for by X
r = 0.8
Y
6
5
r2 = 0.64
r = 0.5 r2 = 0.25
64% of the variance in Y
is explained by X
4
3
2
1
Y
6
5
25% of the variance in Y
is explained by X
4
3
2
1
1
2
3
4
5
6 X
1
2
3
4
5
6 X
Measures of Error in Regression
Reasoning in Psychology
Using Statistics
• Sum of the squared residuals = SSresidual = SSerror
Y
6
5
4
3
2
1
1
2
3
4
5
6
X
• Compute the difference
between the predicted
values and the observed
values (“residuals”)
• Square the differences
• Add up the squared
differences
Measures of Error in Regression
Reasoning in Psychology
Using Statistics
• Sum of the squared residuals = SSresidual = SSerror
X Y
6 6
1 2
5 6
3 4
Yˆ
3 2
mean 3.6 4.0
Ŷ = 0.92X + 0.688
Predicted values of Y
(points on the line)
Measures of Error in Regression
Reasoning in Psychology
Using Statistics
• Sum of the squared residuals = SSresidual = SSerror
X Y
6 6
1 2
5 6
3 4
Yˆ
6.2
= (0.92)(6)+0.688
3 2
mean 3.6 4.0
Ŷ = 0.92X + 0.688
Predicted values of Y
(points on the line)
Measures of Error in Regression
Reasoning in Psychology
Using Statistics
• Sum of the squared residuals = SSresidual = SSerror
X Y
6 6
1 2
5 6
3 4
3 2
mean 3.6 4.0
Yˆ
6.2 = (0.92)(6)+0.688
1.6 = (0.92)(1)+0.688
5.3 = (0.92)(5)+0.688
3.45 = (0.92)(3)+0.688
3.45 = (0.92)(3)+0.688
Ŷ = 0.92X + 0.688
Measures of Error in Regression
Reasoning in Psychology
Using Statistics
• Sum of the squared residuals = SSresidual = SSerror
X Y
6 6
1 2
5 6
3 4
3
2
Ŷ = 0.92X + 0.688
Yˆ
6.2
1.6
5.3
3.45
3.45
Y
6.2
6
5.3
5
4
3.45
3
2
1.6
1
1
2
3
4
5
6
X
Measures of Error in Regression
Reasoning in Psychology
Using Statistics
• Sum of the squared residuals = SSresidual = SSerror
X Y
6 6
1 2
5 6
3 4
3 2
mean 3.6 4.0
Yˆ
6.2
1.6
5.3
3.45
3.45
residuals
(Y - Yˆ )
-0.20
2 - 1.6 = 0.40
6 - 5.3 = 0.70
4 - 3.45 = 0.55
2 - 3.45 = -1.45
6 - 6.2 =
Quick check
0.00
Ŷ = 0.92X + 0.688
Measures of Error in Regression
Reasoning in Psychology
Using Statistics
• Sum of the squared residuals = SSresidual = SSerror
X Y
6 6
1 2
5 6
3 4
3 2
mean 3.6 4.0
Ŷ = 0.92X + 0.688
Yˆ
6.2
1.6
5.3
3.45
3.45
(
) (
Y - Yˆ
-0.20
0.40
0.70
0.55
-1.45
0.00
)
Y - Yˆ
0.04
0.16
0.49
0.30
2.10
3.09
2
SSERROR
Measures of Error in Regression
Reasoning in Psychology
Using Statistics
• Sum of the squared residuals = SSresidual = SSerror
X Y
6 6
1 2
5 6
3 4
3 2
mean 3.6 4.0
Ŷ = 0.92X + 0.688
Yˆ
6.2
1.6
5.3
3.45
3.45
(Y - Y )
4.0
4.0
4.0
0.0
4.0
2
(
) (
Y - Yˆ
-0.20
0.40
0.70
0.55
-1.45
0.00
16.0
SSY
)
Y - Yˆ
0.04
0.16
0.49
0.30
2.10
3.09
2
SSERROR
Measures of Error in Regression
Reasoning in Psychology
Using Statistics
• Standard error of the estimate represents the
average deviation from the line
SSerror
=
df
Y
6
5
4
3
2
1
df = n - 2
SSerror
=
n-2
1
2
3
4
5
6
X
3.09
=
= 1.01
3
Measures of Error in Regression
Reasoning in Psychology
Using Statistics
• SPSS Regression output gives
you a lot of stuff
• r2
– percent variance in Y
accounted for by X
• Standard error of the
estimate
– the average deviation
from the line
• SSresiduals or SSerror
Measures of Error in Regression
Reasoning in Psychology
Using Statistics
• You’ll practice computing the regression
equation and error for the “best fitting line”
(by hand and using SPSS)
In lab
Reasoning in Psychology
Using Statistics
Download