Class 1: Regression Review

advertisement
Regression Review
Sociology 229A
Copyright © 2008 by Evan Schofer
Do not copy or distribute without
permission
Linear Functions
• Formula: Y = a + bX
– Is a linear formula. If you graphed X and Y for any
chosen values of a and b, you’d get a straight line.
– It is a family of functions: For any value of a and b,
you get a particular line
• a is referred to as the “constant” or “intercept”
• b is referred to as the “slope”
• To graph a linear function: Pick values for X,
compute corresponding values of Y
• Then, connect dots to graph line
Linear Functions: Y = a + bX
• The “constant” or “intercept” (a)
– Determines where the line intersects the Y-axis
– If a increases (decreases),
line moves up (down)
20 the
Y axis
Y=14 - 1.5X
Y= 3 -1.5X
10
Y= -9 - 1.5X
X axis
-10
-5
0
-10
-20
5
10
Linear Functions: Y = a + bX
• The slope (b) determines the steepness of the line
20
Y=3-1.5X
Y axis
10
Y=2+.2X
-10
X axis
-5
0
-10
Y=3+3X
-20
5
10
Linear Functions: Slopes
• The slope (b) is the ratio of change in Y to change
in X
Slope:
20
b = 15/5 = 3
Change in Y=15
10
Change in X =5
-10
-5
0
-10
Y=3+3X
-20
5
10
The slope tells you how
many points Y will
increase for any single
point increase in X
Linear Functions as Summaries
• A linear function can be used to summarize the
relationship between two variables:
10
Slope:
9
8
b = 2 / 40K =
7
.05 pts/K$
6
5
4
Change in X
= 40K
3
2
Change in Y
=2
If you change units:
1
0
0
INCOME
20000
40000
60000
80000
100000
b = .00005 / $1
b = .5 pts/$10K
b = 5 pts/$100K
Linear Functions as Summaries
• Slope and constant can be “eyeballed” to
approximate a formula: Happy = 2 + .05Income
10
Slope (b):
9
8
b = 2 / 40K =
7
.05 pts/1K$
6
5
Constant (a) =
Value where line
hits Y axis
4
3
2
a=2
1
0
0
INCOME
20000
40000
60000
80000
100000
Linear Functions as Summaries
• Linear functions can powerfully summarize data:
– Formula: Happy = 2 + .05Income
• Gives a sense of how the two variables are related
– Namely, people get a .05 increase in happiness for
every extra 1K dollar of income (or 5 pts per $100K)
• Also lets you “predict” values. What if someone
earns $150,000?
– Happy = 2 + .05($150K) = 9.5
• But be careful… You shouldn’t assume that a
relationship remains linear indefinitely
– Also, negative income or happiness make no sense…
Linear Functions as Summaries
• Come up with a linear function that summarizes
this real data: years of education vs. job prestige
100
It isn’t always easy!
The line you choose
depends on how much
you “weight” these
points.
90
80
70
60
50
40
30
20
10
0
-10
-20
-30
-40
0
2
EDUCATN
4
6
8
10
12
14
16
18
20
Linear Functions as Summaries
• One estimate of the linear function
The line meets the
Y-axis at Y=5.
Thus a = 5
100
90
80
70
The line increases to
about 65 as X reaches
20. The increase is
60 in Y per 20 in X.
Thus: b = 60/20 = 3
60
50
40
30
20
10
0
-10
-20
Formula:
-30
-40
Y = 5 + 3X
0
2
EDUCATN
4
6
8
10
12
14
16
18
20
Linear Functions as Summaries
• Questions:
• How much additional job prestige do you get by
going to college (an extra 4 years of education)?
– Formula: Prestige = 5 + 3*Education
• Answer: About 12 points of job prestige
• Change in X is 4… Slope is 3. 3 x 4 = 12 points
• If X=12, Y=5+3*12 = 41; If X=16, Y=5+3*16 = 53
• What is the interpretation of the constant?
• It is the predicted job prestige of someone with
zero years of education… (Prestige = 5)
Linear Functions as Prediction
• Linear functions can summarize the relationship
between two variables:
– Formula: Happy = 2 + .05Income (in 1,000s)
• Linear functions can also be used to “predict”
(estimate) a case’s value of variable (Yi) based on
its value of another variable (Xi)
– If you know the constant and slope
• “Y-hat” indicates an estimation function:
Yˆi  a  bYX X i
• bYX denotes the slope of Y with respect to X
Prediction with Linear Functions
• If Xi (Income) = 60K, what is our estimate
of Yi (Happiness)? Happy = 2 + .05Income
10
Happiness-hat =
9
2 + .05(60) = 5
8
7
There is an case with
Income =60K
6
5
The prediction in
imperfect… The case
falls at X = 5.3 (above
the line).
4
3
2
1
0
0
INCOME
20000
40000
60000
80000
100000
The Linear Regression Model
• To model real data, we must take into account that
points will miss the line
– Similar to ANOVA, we refer to the deviation of points
from the estimated value as “error” (ei)
• In ANOVA the estimated value is: the group mean
– i.e., the grand mean plus the group effect
• In regression the estimated value is derived from
the formula Y = a + bX
– Estimation is based on the value of X, slope, and
constant (assumes linear relationship between X and Y)
The Linear Regression Model
• The value of any point (Yi) can be modeled as:
Yi  a  bYX X i  ei
• The value of Y for case (i) is made up of
• A constant (a)
• A sloping function of the case’s value on
variable X (bYX)
• An error term (e), the deviation from the line
• By adding error (e), an abstract mathematical
function can be applied to real data points
The Linear Regression Model
Case 7: X=3, Y=5
• Visually: Yi = a + bXi + ei
e = 1.5
4
bX = 3(.5) = 1.5
Constant (a) = 2
2
a=2
-4
-2
0
Y=2+.5X
-2
-4
2
4
Estimating Linear Equations
• Question: How do we choose the best line to
describe our real data?
– Previously, we just “eyeballed” it
• Answer: Look at the error
– If a given line formula misses points by a lot, the
observed error will be large
– If the line is as close to all points as possible,
observed error will be small
• Of course, even the best line has some error
– Except when all data points are perfectly on a line
Estimating Linear Equations
• A poor estimation (big error)
4
2
-4
-2
0
2
-2
Y=1.5-1X
-4
4
Estimating Linear Equations
• Better estimation (less error)
4
2
-4
-2
0
Y=2+.5X
-2
-4
2
4
Estimating Linear Equations
• Look at the improvement (reduction) in error:
High Error
vs.
Low Error
Estimating Linear Equations
• Idea: The “best” line is the one that has the least
error (deviation from the line)
• Total deviation from the line can be expressed as:
N
N
i 1
i 1
ˆ
(
Y

Y
 i i )   ei
• But, to make all deviation positive, we square it,
producing the “sum of squares error”
N
N
i 1
i 1
2
2
ˆ
 (Yi  Yi )   ei
Estimating Linear Equations
• Goal: Find values of constant (a) and slope (b)
that produce the lowest squared error
– The “least squares” regression line
• The formula for the slope (b) that yields the “least
squares error” is:
bYX
sYX
 2
sX
• Where s2x is the variance of X
• And sYX is the covariance of Y and X.
Covariance
• Variance: Sum of deviation about Y-bar over N-1
N
s 
2
Y
 (Y  Y )
2
i
i 1
N 1
• Covariance (sYX): Sum of deviation about Y-bar
multiplied by deviation around X-bar:
N
sYX 
 (Y  Y )( X
i 1
i
N 1
i
 X)
Covariance
• Covariance: A measure of how much variance of a
case in X is accompanied by variance in Y
• It measures whether deviation (from mean) in X
tends to be accompanied by similar deviation in Y
• Or if cases with positive deviation in X have negative
deviation in Y
• This is summed up for all cases in the data
• The covariance is one numerical measure that
characterizes the extent of linear association
• As is the correlation coefficient (r).
Regression Example
• Example: Study time and student achievement.
– X variable: Average # hours spent studying per day
– Y variable: Score on reading test
Case
X
Y
1
2.6
28
2
1.4
13
3
.65
17
4
4.1
31
5
.25
8
6
1.9
16
Y axis
30
X-bar = 1.8
20
Y-bar = 18.8
10
X axis
0
0
1
2
3
4
Regression Example
• Slope = covariance (X and Y) / variance of X
– X-bar = 1.8, Y-bar = 18.8
Case
X
Y
X Dev
Y Dev
XD*YD
1
2.6
28
0.8
9.2
7.36
2
1.4
13
-0.4
-5.8
1.92
3
.65
17
1.15
-1.8
-2.07
4
4.1
31
2.3
12.2
28.06
5
.25
8
-1.55
-10.8
16.74
6
1.9
16
0.1
-2.8
-.28
Sum of X
deviation *
Y deviation
= 51.73
Regression Example
• Calculating the Covariance:
N
sYX 
 (Y  Y )( X
i 1
i
i
 X)
N 1
51.73

 10.36
6 1
• Standard deviation of X = 1.4
• Variance = square of S.D. = 1.96
sYX 10.36
• Finally:
bYX 
s
2
X

1.96
 5.3
a  Y  bYX X 18.8  5.3(1.8)  9.26
Regression Example
•
•
•
•
Results: Slope b = 5.3, constant a = 9.3
Equation: TestScore = 9.3 + 5.3*HrsStudied
Question: What is the interpretation of b?
Answer: For every hour studied, test scores
increase by 5.3 points
• Question: What is the interpretation of the
constant?
• Answer: Individuals who studied zero hours are
predicted to score 9.3 on a the test.
Computing Regressions
• Regression coefficients can be calculated in SPSS
– You will rarely, if ever, do them by hand
• SPSS will estimate:
– The value of the constant (a)
– The value of the slope (b)
– Plus, a large number of related statistics and results of
hypothesis testing procedures
Example: Education & Job
Prestige
• Example: Years of Education versus Job Prestige
– Previously, we made an “eyeball” estimate of the line
100
90
80
Our estimate:
70
60
Y = 5 + 3X
50
40
30
20
10
0
-10
-20
-30
-40
0
2
EDUCATN
4
6
8
10
12
14
16
18
20
Example: Education & Job
Prestige
• The actual SPSS regression results for that data:
Model Summary
Model
1
R
R Square
a
.521
.272
Adjus ted
R Square
.271
Estimates of a and b:
“Constant” = a = 9.427
Slope for “Year of
School” = b = 2.487
Std. Error of
the Es timate
12.40
a. Predictors : (Constant), HIGHEST YEAR OF SCHOOL
COMPLETED
Coefficientsa
Model
1
(Cons tant)
HIGHEST YEAR OF
SCHOOL COMPLETED
Uns tandardized
Coefficients
B
Std. Error
9.427
1.418
2.487
.108
Standardi
zed
Coefficien
ts
Beta
.521
t
6.648
Sig.
.000
23.102
.000
a. Dependent Variable: RS OCCUPATIONAL PRESTIGE SCORE
• Equation: Prestige = 9.4 + 2.5 Education
• A year of education adds 2.5 points job prestige
Example: Education & Job
Prestige
• Comparing our “eyeball” estimate to the actual
OLS regression line
100
90
80
Our estimate:
70
60
Y = 5 + 3X
50
40
30
20
Actual OLS
regression line
computed in
SPSS
10
0
-10
-20
-30
-40
0
2
EDUCATN
4
6
8
10
12
14
16
18
20
R-Square
• The R-Square statistic indicates how well the
regression line “explains” variation in Y
• It is based on partitioning variance into:
• 1. Explained (“regression”) variance
– The portion of deviation from Y-bar accounted for by
the regression line
• 2. Unexplained (“error”) variance
– The portion of deviation from Y-bar that is “error”
• Formula:
2
YX
R
2
YX
2 2
X Y
SS REGRESSION
s


SSTOTAL
s s
R-Square
• Visually: Deviation is partitioned into two parts
“Error
Variance”
4
Y-bar
-4
“Explained
Variance”
2
-2
0
Y=2+.5X
-2
2
4
Example: Education & Job Prestige
• R-Square & Hypothesis testing information:
Model Summary
Model
1
R
R Square
a
.521
.272
Adjus ted
R Square
.271
The R and R-Square
indicate how well the line
summarizes the data
Std. Error of
the Es timate
12.40
a. Predictors : (Constant), HIGHEST YEAR OF SCHOOL
COMPLETED
Coefficientsa
Model
1
(Cons tant)
HIGHEST YEAR OF
SCHOOL COMPLETED
Uns tandardized
Coefficients
B
Std. Error
9.427
1.418
2.487
.108
Standardi
zed
Coefficien
ts
Beta
.521
t
6.648
Sig.
.000
23.102
.000
a. Dependent Variable: RS OCCUPATIONAL PRESTIGE SCORE
This information allows us to do
hypothesis tests about constant & slope
Hypothesis Tests: Slopes
• Given: Observed slope relating Education to Job
Prestige = 2.47
• Question: Can we generalize this to the
population of all Americans?
• How likely is it that this observed slope was actually drawn
from a population with slope = 0?
•
•
•
•
Solution: Conduct a hypothesis test
Notation: slope = b, population slope = b
H0: Population slope b = 0
H1: Population slope b  0 (two-tailed test)
Example: Slope Hypothesis Test
• The actual SPSS regression results for that data:
Model Summary
Model
1
R
R Square
a
.521
.272
Adjus ted
R Square
.271
t-value and “sig” (pvalue) are for hypothesis
tests about the slope
Std. Error of
the Es timate
12.40
a. Predictors : (Constant), HIGHEST YEAR OF SCHOOL
COMPLETED
Coefficientsa
Model
1
(Cons tant)
HIGHEST YEAR OF
SCHOOL COMPLETED
Uns tandardized
Coefficients
B
Std. Error
9.427
1.418
2.487
.108
Standardi
zed
Coefficien
ts
Beta
.521
t
6.648
Sig.
.000
23.102
.000
a. Dependent Variable: RS OCCUPATIONAL PRESTIGE SCORE
• Reject H0 if: T-value > critical t (N-2 df)
• Or, “sig.” (p-value) less than a (often a = .05)
Hypothesis Tests: Slopes
• What information lets us to do a hypothesis test?
• Answer: Estimates of a slope (b) have a
sampling distribution, like any other statistic
– It is the distribution of every value of the slope, based
on all possible samples (of size N)
• If certain assumptions are met, the sampling
distribution approximates the t-distribution
– Thus, we can assess the probability that a given value
of b would be observed, if b = 0
– If probability is low – below alpha – we reject H0
Hypothesis Tests: Slopes
• Visually: If the population slope (b) is zero, then
the sampling distribution would center at zero
– Since the sampling distribution is a probability
distribution, we can identify the likely values of b if
the population slope is zero
Sampling
distribution of
the slope
b
If b=0, observed slopes should
commonly fall near zero, too
If observed slope falls very far
from 0, it is improbable that b is
really equal to zero. Thus, we
can reject H0.
0
Regression Assumptions
• Assumptions of simple (bivariate) regression
• If assumptions aren’t met, hypothesis tests may be inaccurate
– 1. Random sample w/ sufficient N (N > ~20)
– 2. Linear relationship among variables
• Check scatterplot for non-linear pattern; (a “cloud” is OK)
– 3. Conditional normality: Y = normal at all values of X
• Check histograms of Y for normality at several values of X
– 4. Homoskedasticity – equal error variance at all values
of X
• Check scatterplot for “bulges” or “fanning out” of error
across values of X
– Additional assumptions are required for multivariate
regression…
Bivariate Regression Assumptions
• Normality:
Examine sub-samples at different values of X.
Make histograms and check for normality.
12
10
10
8
6
4
8
2
Std. Dev = 1.51
Mean = 3.84
N = 60.00
0
.50
1.50
1.00
2.50
2.00
HAPPY
3.50
3.00
4.50
4.00
5.50
5.00
6.50
6.00
7.50
7.00
8.00
Good
6
12
4
10
8
6
4
2
2
Std. Dev = 3.06
Mean = 4.58
N = 60.00
0
.50
1.50
2.50 3.50
1.00 2.00
0
3.00
4.50
5.50 6.50
4.00 5.00
6.00
7.50
8.50 9.50
7.00 8.00
9.00 10.00
HAPPY
0
INCOME
20000
40000
60000
80000
100000
Not very good
Bivariate Regression Assumptions
• Homoskedasticity: Equal Error Variance
Examine error at
different values of X.
Is it roughly equal?
10
8
Here, things look
pretty good.
6
4
2
0
0
INCOME
20000
40000
60000
80000
100000
Bivariate Regression Assumptions
• Heteroskedasticity: Unequal Error Variance
At higher values of
X, error variance
increases a lot.
10
8
6
This looks pretty
bad.
4
2
0
0
20000
10000
INCOME
40000
30000
60000
50000
80000
70000
100000
90000
Regression Hypothesis Tests
• If assumptions are met, the sampling distribution
of the slope (b) approximates a T-distribution
• Standard deviation of the sampling distribution is called the
standard error of the slope (sb)
• Population formula of standard error:
sb 
s
N
(X
i 1
i
2
e
 X)
2
• Where se2 is the variance of the regression error
Regression Hypothesis Tests
• Estimating se2 lets us estimate the standard error:
N
sˆ e 
e
2
i
i 1
N 2

SS ERROR
 MS ERROR
N 2
• Now we can estimate the S.E. of the slope:
ŝ b 
MS ERROR
N
(X
i 1
i
 X)
2
Regression Hypothesis Tests
• Finally: A t-value can be calculated:
• It is the slope divided by the standard error
t N 2
bYX


sb
bYX
MS ERROR
2
s X ( N  1)
• Where sb is the sample point estimate of the S.E.
• The t-value is based on N-2 degrees of freedom
• Reject H0 if observed t > critical t (e.g., 1.96).
Example: Education & Job Prestige
• T-values can be compared to critical t...
Coefficientsa
Model
1
(Cons tant)
HIGHEST YEAR OF
SCHOOL COMPLETED
Uns tandardized
Coefficients
B
Std. Error
9.427
1.418
2.487
.108
Standardi
zed
Coefficien
ts
Beta
.521
t
6.648
Sig.
.000
23.102
.000
a. Dependent Variable: RS OCCUPATIONAL PRESTIGE SCORE
SPSS estimates
the standard error
of the slope. This
is used to calculate
a t-value
The t-value can be compared to the
“critical value” to test hypotheses. Or,
just compare “Sig.” to alpha.
If t > crit or Sig < alpha, reject H0
Download