PowerPoint

advertisement
Basic linear regression and multiple regression
Psych 350
Lecture 12 : R. Chris Fraley
http://www.yourpersonality.net/psych350/fall2012/
Example
6
4
2
0
HAPPINESS
8
10
• Let’s say we wish to model the relationship between coffee
consumption and happiness
-4
-2
0
COFFEE
2
4
-4
-2
0
x
2
4
-4
-2
0
x
2
4
-4
-2
0
x
2
4
-50
0
0. 8
-40
10
-5
-5
0. 9
-4
-2
0
x
2
4
30
-20
1 - 2 * x^2
-30
20
1 + 2 * x^2
1. 0
1+0* x
0
1- 2* x
0
1+2* x
40
-10
1. 1
5
5
0
50
1. 2
10
10
Some Possible Functions
-4
-2
0
x
2
4
0. 8
-5
0. 9
1. 0
1. 1
1. 2
• Linear relationships
• Y = a + bX
– a = Y-intercept (the
value of Y when X = 0)
– b = slope (the “rise over
the run”, the steepness
of the line); a weight
1+0* x
10
5
0
1- 2* x
0
-5
1+2* x
5
10
Lines
-4
-2
0
2
x
Y = 1 + 2X
4
-4
-2
0
x
2
4
-4
Lines and intercepts
0
Y = 5 + 2X
-5
HAPPINESS
5
10
• Y = a + 2X
• Notice that the implied
values of Y go up as we
increase a.
• By changing a, we are
changing the elevation
of the line.
Y = 3 + 2X
Y = 1 + 2X
-4
-2
0
COFFEE
2
4
10
• Slope as “rise over
run”: how much of a
change in Y is there
given a 1 unit increase
in X.
• As we move up 1 unit
on X, we go up 2 units
on Y
• 2/1 = 2 (the slope)
0
rise
-5
run
0
1- 2* x
5
rise from 1 to 3
(a 2 unit
change)
-5
1+2* x
5
10
Lines and slopes
move from
0 to 1
-4
-2
0
x
Y = 1 + 2X
2
4
-4
-2
0
x
2
Lines and slopes
10
• Notice that as we
increase the slope, b,
we increase the
steepness of the line
0
Y = 1 + 2X
-5
HAPPINESS
5
Y = 1 + 4X
-4
-2
0
COFFEE
2
4
Lines and slopes
10
b=4
0
b=0
b=-2
-5
HAPPINESS
5
b=2
b=-4
-4
-2
0
COFFEE
2
4
• We can also have
negative slopes and
slopes of zero.
• When the slope is zero,
the predicted values of
Y are equal to a.
Y = a + 0X
Y=a
Other functions
15
10
5
0
HAPPINESS
20
25
• Quadratic function
• Y = a + bX2
– a still represents the
intercept (value of Y
when X = 0)
– b still represents a
weight, and influences
the magnitude of the
squaring function
-4
-2
0
COFFEE
2
4
Quadratic and intercepts
30
• As we increase a, the
elevation of the curve
increases
15
10
5
Y = 0 + 1X2
0
HAPPINESS
20
25
Y = 5 + 1X2
-4
-2
0
COFFEE
2
4
120
Quadratic and Weight
• When we increase the
weight, b, the quadratic
effect is accentuated
80
60
40
20
Y = 0 + 1X2
0
HAPPINESS
100
Y = 0 + 5X2
-4
-2
0
COFFEE
2
4
Quadratic and Weight
• As before, we can have
negative weights for
quadratic functions.
• In this case, negative
values of b flip the
curve upside-down.
• As before, when b = 0,
the value of Y = a for
all values of X.
100
Y = 0 + 5X2
0
-50
Y = 0 + 0X2
Y = 0 – 1X2
-100
HAPPINESS
50
Y = 0 + 1X2
Y = 0 – 5X2
-4
-2
0
COFFEE
2
4
0 030 10
20 030 10
-10
5 10
-5
0
-10
5 10
-5
0
-30
5 10
-4
-4
-4
-4
-
-10 -30
0
0
0
0
0
0
0
0
x
2
x
2
x
2
x
2
x
2
x
2
x
2
-10 -30
0
-10
-2 * x + 0 * x^2 -2 * x + 0 * x^2 -2 * x + 0 * x^2
-2 * x + - 1 * x^2-2 * x + - 1 * x^2-2 * x + - 1 * x^2
1 * x^2 -2 * x + 1 * x^2 -2 * x + 1 * x^2
0
-4
+
0
-4
quadratic weight (b2)
20
-10 30
-5
-4
4
4
4
4
4
4
0
-4
020 5 10
0
20 5 10-0.010
20
0.0-0.0100.0100.0-0.0100.0100.0-25
-15
0.010
-5 0 -15
-25
-25
-5 0 -15
-5 0
-10 30
-5 0 -10
5 10
-5 0 -10
5 10
-5 0
-30
5 10
-10 -30
0
-10 -30
0
-10 0
0 030 10 20 030 10 20
* x^2 0 * x + 1 * x^2 0 * x + 1 * x^20 * x + 0 * x^2 0 * x + 0 * x^2 0 * x + 0 * x^20 * x + -1 * x^2 0 * x + -1 * x^2 0 * x + -1 * x^2
4
-4
-4
-4
-4
-4
-4
linear weight (b1)
- -40
2
04
2
x
x
-4
0 2
04
x
x
-40
2 04 2
x
x
-4
0 2
04
x
x
0 2
-4
04
x
x
-40
2 04 2
x
x
-4
0 2
04
x
x
2
2
2
2
4
4
4
4
4
0.0-0.0100.0100.0-0.0100.0100.0-25
0.010
-15
4
0
20 5 10-0.010
20
-25
-5 0 -15
4
020 5 10
-4
-4
-4
-4
-4
-4
-4
00
0
0
0
0
0
0
x
2
x
2
x
2
x
2
x
2
x
2
x
2
-5 0
* x^2 0 * x + 1 * x^2 0 * x + 1 * x^20 * x + 0 * x^2 0 * x + 0 * x^2 0 * x + 0 * x^20 * x + -1 * x^2 0 * x + -1 * x^2 0 * x + -1 * x^2
-25
-5 0 -15
* x^2 0 * x + 1 * x^2 0 * x + 1 * x^20 * x + 0 * x^2 0 * x + 0 * x^2 0 * x + 0 * x^20 * x + -1 * x^2 0 * x + -1 * x^2 0 * x + -1 * x^2
Linear & Quadratic Combinations
4
4
4
4
4
4
-4
-4
-10
5 10
-5
0
-30
5 10
-10 -30
0
04
x
x
-40
2 04 2
x
x
0
-4
-4
0 2
-4
0 2
04
5 10
-10
-5
-4
x
x
x
0
-4
20 030 10
-4
x
0 2
-4
04
-10 30
-5
20
-4
-4
0 2
+04
2
x
x
-40
2 04 2
x
x
-4
0 2
04
x
x
2
2
2
2
-10 -30
0
-10
20
-10 30
-5
4
4
0
4
20 030 10
-10
5 10
-5
0
4
0 030 10
-10
5 10
-5
0
-30
5 10
-10 -30
0
4
-4
4
-4
4
-4
-4
-4
-4
-4
0
x
0
x
0
x
0
x
0
0
0
x
2
• When linear and
quadratic terms are
present in the same
equation, one can
derive j-shaped curves
• Y = a + b1X + b2X2
2
2
2
2
x
2
x
2
-10
20
-10 30
-5
4
4
0
4
20 030 10
-10
5 10
-5
0
4
0 030 10
-10
5 10
-5
0
-30
5 10
-10 -30
0
-10 -30
0
-10
* x^2 2 * x + 1 * x^2 2 * x + 1 * x^22 * x + 0 * x^2 2 * x + 0 * x^2 2 * x + 0 * x^22 * x + -1 * x^2 2 * x + -1 * x^2 2 * x + -1 * x^2
-10 -30
0
* x^2 2 * x + 1 * x^2 2 * x + 1 * x^22 * x + 0 * x^2 2 * x + 0 * x^2 2 * x + 0 * x^22 * x + -1 * x^2 2 * x + -1 * x^2 2 * x + -1 * x^2
0 030 10
0
0
0
020 5 10
020 5 10-0.010
20
0.0-0.0100.0100.0-0.0100.0100.0-25 0.010
-15
-25
-5 0 -15
-25
-5 0 -15
-5 0
* x^2 2 * x + 1 * x^2 2 * x + 1 * x^22 * x + 0 * x^2 2 * x + 0 * x^2 2 * x + 0 * x^22 * x + -1 * x^2 2 * x + -1 * x^2 2 * x + -1 * x^2
4
4
-4
4
-4
4
-4
-4
-4
-4
-4
0
0
0
0
0
0
0
x
2
4
x
x
x
x
x
x
2
4
2
4
2
4
2
4
2
4
2
4
Some terminology
• When the relation between variables are expressed in this
manner, we call the relevant equation(s) mathematical
models
• The intercept and weight values are called parameters of
the model.
• Although one can describe the relationship between two
variables in the way we have done here, for now on we’ll
assume that our models are causal models, such that the
variable on the left-hand side of the equation is assumed to
be caused by the variable(s) on the right-hand side.
Terminology
• The values of Y in these models are often called predicted
values, sometimes abbreviated as Y-hat or Yˆ. Why? They
are the values of Y that are implied by the specific
parameters of the model.
Estimation
• Up to this point, we have assumed that our models are
correct.
• There are two important issues we need to deal with,
however:
– Assuming the basic model is correct (e.g., linear), what
are the correct parameters for the model?
– Is the basic form of the model correct? That is, is a
linear, as opposed to a quadratic, model the appropriate
model for characterizing the relationship between
variables?
Estimation
• The process of obtaining the correct parameter values
(assuming we are working with the right model) is called
parameter estimation.
Parameter Estimation example
-5
0
y
5
10
• Let’s assume that we believe
there is a linear relationship
between X and Y.
• Assume we have collected the
following data
• Which set of parameter values
will bring us closest to
representing the data
accurately?
-3
-2
-1
0
x
1
2
3
Estimation example
Yˆ  2  2 X
10
• We begin by picking some
values, plugging them into the
linear equation, and seeing
how well the implied values
correspond to the observed
values
• We can quantify what we mean
by “how well” by examining
the difference between the
model-implied Y and the
actual Y value
• this difference,  y  yˆ  , is often
called error in prediction
5
-8
0
 y  yˆ 
4
8
-5
y
-4
-2
160
-3
-2
-1
0
x
1
2
3
Estimation example
Yˆ  2  1X
10
• Let’s try a different value of b
and see what happens
• Now the implied values of Y
are getting closer to the actual
values of Y, but we’re still off
by quite a bit
5
-6
0
y
-3
3
-5
6
-1
90
-3
-2
-1
0
x
1
2
3
Estimation example
Yˆ  2  0 X
10
• Things are getting better, but
certainly things could improve
5
-4
0
y
-2
2
-5
4
0
40
-3
-2
-1
0
x
1
2
3
Estimation example
Yˆ  2  1X
10
• Ah, much better
5
-2
0
y
-1
1
-5
2
1
10
-3
-2
-1
0
x
1
2
3
Estimation example
Yˆ  2  2 X
10
• Now that’s very nice
• There is a perfect
correspondence between the
implied values of Y and the
actual values of Y
5
0
0
y
0
0
-5
0
2
0
-3
-2
-1
0
x
1
2
3
Estimation example
Yˆ  2  3X
10
• Whoa. That’s a little worse.
• Simply increasing b doesn’t
seem to make things
increasingly better
5
2
0
y
1
-1
-5
-2
3
10
-3
-2
-1
0
x
1
2
3
Estimation example
Yˆ  2  4 X
10
• Ugg. Things are getting worse
again.
5
4
0
y
2
-2
-5
-4
4
40
-3
-2
-1
0
x
1
2
3
Parameter Estimation example
• Here is one way to think about what we’re doing:
– We are trying to find a set of parameter values that will
give us a small—the smallest—discrepancy between
the predicted Y values and the actual values of Y.
• How can we quantify this?
Parameter Estimation example
• One way to do so is to find the difference between each
value of Y and the corresponding predicted value (we
called these differences “errors” before), square these
differences, and average them together

Y  Yˆ
N

2
Parameter Estimation example
• The form of this equation should be familiar. Notice that
it represents some kind of average of squared deviations
• This average is often called error variance.

Y  Yˆ
N

2
Parameter Estimation example
• In estimating the parameters of our model, we are trying
to find a set of parameters that minimizes the error
variance. In other words, we want  Y  Yˆ  to be as small
N
as it possibly can be.
• The process of finding this minimum value is called leastsquares estimation.
2
Parameter Estimation example
20
10
0
error
30
40
• In this graph I have plotted the
error variance as a function of the
different parameter values we
chose for b.
• Notice that our error was large at
first (at b = -2), but got smaller as
we made b larger. Eventually, the
error reached a minimum when b
= 2 and, then, began to increase
again as we made b larger.
-2
-1
0
1
2
parameter values
Different values of b
3
4
Parameter Estimation example
20
10
0
error
30
40
• The minimum in this example
occurred when b = 2. This is
the “best” value of b, when we
define “best” as the value that
minimizes the error variance.
• There is no other value of b
that will make the error
smaller. (0 is as low as you can
go.)
-2
-1
0
1
2
parameter values
Different values of b
3
4
Ways to estimate parameters
• The method we just used is sometimes called the brute
force or gradient descent method to estimating
parameters.
– More formally, gradient decent involves starting with
viable parameter value, calculating the error using
slightly different value, moving the best guess
parameter value in the direction of the smallest error,
then repeating this process until the error is as small as
it can be.
• Analytic methods
– With simple linear models, the equation is so simple
that brute force methods are unnecessary.
Analytic least-squares estimation
• Specifically, one can use calculus to find the
values of a and b that will minimize the error
function
1
N

1
N
 Y  a  bX 
Y  Yˆ

2
Yˆ  a  bX
2
or
1
N
2


Y

a

bX

Analytic least-squares estimation
• When this is done (we won’t actually do the
calculus here  ), the obtain the following
equations:
b  rX ,Y
sY
sX
a  M Y  bM X or
 sY
a  M Y   r
 sX

 M X

Analytic least-squares estimation
• Thus, we can easily find the least-squares
estimates of a and b from simple knowledge of (1)
the correlation between X and Y, (2) the SD’s of
X and Y, and (3) the means of X and Y:
b  rX ,Y
sY
sX
a  M Y  bM X
A neat fact
• Notice what happens when X and Y are in
standard score form
b  rX ,Y
sY
1
 rX ,Y  rX ,Y
sX
1
a  M Y  bM X  0  b0  0
• Thus,
zˆY  rX ,Y z X
• In the parameter estimation example, we dealt with a
situation in which a linear model of the form Y = 2 + 2X
perfectly accounted for the data. (That is, there was no
discrepancy between the values implied by the model and
the actual data.)
• Even when this is not the case (i.e., when the model
doesn’t explain the data perfectly), we can still find least
squares estimates of the parameters.
-3
-2
-1
0
1
2
3
10
-1
1
10
-3
-2
-1
1
2
3
2
4
-3
-2
-1
0
1
2
3
x
-5
-1
-3
-2
-1
0
x
1
2
3
4
52
-3
-2
-1
0
x
0
3
18
2
-5
-5
4
0
8
10
0
0
-3
6
5
error
1
y
3
y
0
5
5
12
10
14
x
10
x
0
1
-5
2
-5
1
0
36
1
0
0
0
3
-5
3
-1
y
-1
y
-2
5
10
5
10
5
-3
y
-3
1
2
3
0
1
2
3
parameter v alues
4
Error Variance
• In this example, the value of b that minimizes the error
variance is also 2. However, even when b = 2, there are
discrepancies between the predictions entailed by the
model and the actual data values.
• Thus, the error variance becomes not only a way to
estimate parameters, but a way to evaluate the basic model
itself.
R-squared
• In short, when the model is a good representation of the
relationship between Y and X, the error variance of the
model should be relatively low.
• This is typically quantified by an index called the multiple
R or the squared version of it, R2.
R-squared
 VARerror 

R  1  
 VARY 
2
• R-squared represents the proportion of the variance in Y
that is accounted for by the model
• When the model doesn’t do any better than guessing the
mean, R2 will equal zero. When the model is perfect (i.e.,
it accounts for the data perfectly), R2 will equal 1.00.
Neat fact
• When dealing with a simple linear model with one X, R2 is
equal to the correlation of X and Y, squared.
• Why? Keep in mind that R2 is in a standardized metric in
virtue of having divided the error variance by the variance
of Y. Previously, when working with standardized scores
in simple linear regression equations, we found that the
parameter b is equal to r. Since b is estimated via leastsquares techniques, it is directly related to R2.
Why is R2 useful?
• R2 is useful because it is a standard metric for interpreting
model fit.
– It doesn’t matter how large the variance of Y is because
everything is evaluated relative to the variance of Y
– Set end-points: 1 is perfect and 0 is as bad as a model
can be.
Multiple Regression
• In many situations in personality psychology we are
interested in modeling Y not only as a function of a single
X variable, but potentially many X variables.
• Example: We might attempt to explain variation in
academic achievement as a function of SES and maternal
education.
• Y = a + b1*SES + b2*MATEDU
• Notice that “adding” a new variable to the model is simple.
This equation states that Y, academic achievement, is a
function of at least two things, SES and MATEDU.
• However, what the regression coefficients now represent is
not merely the change in Y expected given a 1 unit
increase in X. They represent the change in Y given a 1unit change in X assuming all the other variables in the
equation equal zero.
• In other words, these coefficients are kind of like partial
correlations (technically, they are called semi-partial
correlations). We’re statistically controlling SES when
estimating the effect of MATEDU.
• Estimating regression coefficients in SPSS
Correlations
SES
MATEDU
ACHIEVEG5
SES
1.00
.542
.279
MATEDU
.542
1.00
.364
ACHIEVEG5
.279
.364
1.00
Note: The regression
parameter estimates
are in the column
labeled B. Constant
= a = intercept
Achievement = 76.86 + 1.443*MATEDU + .539*SES
Yˆ  76.86  1.443MATEDU  .539SES
• These parameter estimates imply that moving up one unit
on SES leads to a 1.4 unit increase on achievement.
• Moreover, moving up 1 unit in maternal education
corresponds to a half-unit increase in achievement.
• Does this mean that Maternal Education matters more than
SES in predicting educational achievement?
• Not necessarily. As it stands, the two variables might be on
very different metrics. (Perhaps MATEDU ranges from 0
to 20 and SES ranges from 0 to 4.) To evaluate their
relative contributions to Y, one can standardize both
variables or examine standardized regression coefficients.
Z(Achievement) = 0 + .301*Z(MATEDU) + .118*Z(SES)
The multiple R and the R
squared for the full model
are listed here.
This particular model
explains 14% of the
variance in academic
achievement
Adding SES*SES (SES2)
improves R-squared by
about 1%
These parameters suggest
that higher SES predicts
higher achievement, but
in a limiting way. There
are diminishing returns on
the high end of SES.
ZYˆ  0  .256ZMATEDU  .436ZSES  .320 * ZSES * ZSES
SES
a
B1*MATEDU
B2*SES
B3*SES*SES
Y-hat
-2
0
.256*0
.436*-2
-.320*-2*-2
-2.15
-1
0
.256*0
.436*-1
-.320*-1*-1
-0.76
0
0
.256*0
.436*0
-.320*0*0
0.00
1
0
.256*0
.436*1
-.320*1*1
0.12
2
0
.256*0
.436*2
-.320*2*2
-0.41
Predicted Z(Achievement)
0.5
0
-0.5
-1
-1.5
-2
-2.5
-2
-1
0
Z(SES)
1
2
Download