ETA, bivariate regression, R

advertisement
SPS 580 Lecture 5
eta Bivariate regression F Rsq
notes
I.
MORE STRATEGIES ABOUT MAKING A SCALE TO TEST A THEORY
Idea . . . Higher income are less pessimistic about economic change in their neighborhood
Theory . . . Income  Attitude on Neighborhood Economic Future
THREE VARIABLES FOR NEIGHBORHOOD PESSIMISM SCALE
ORIGINAL
CODE
1
2
3
nbrchg Neighborhood
Changes In Past Five
Years
1
2
3
nbinvest Would Invest In
Current Neighborhood
1 Better
2,372
1 Better
2,748
1 Good Investment
7,125
2 Worse
1,727
2 Worse
1,541
2 Better Elsewhere
1,775
3 Same
4,886
3 Same
4,545
8 DK
149
300
239
7 Ref
3
2
15
1
1
8
7
9
0
nbrfutr In 5 Yrs
Neighborhood Will Be
9 NA
Neighborhood
pessimism scale
67%
18%
10%
5%
100%
Neighborhood pessimism scale
67%
RECODE
=0
=1
=0
=0
= 9 MIS
= 9 MIS
 an OK scale
 Skewed
distribution (not
good)
18%
0
10%
1
Tail (= skew)
5%
2
3
Nice scale
Horrible scale
91%
30%
26%
19%
Skewed
0
25%
6%
2%
1%
1
2
3
0
Nothing but tail, in reality at best a dichotomy
Mean = 0.13 (i.e. almost no variance)
Almost everybody already has the same score
No differences will be significant
1
2
3
Not skewed at all . . . No tail
mean = 1.54 Lots of variance
maximum chance for finding
significant differences
RULE OF THUMB: If you have a choice, code component variables to maximize scale variance
E.g., Sum of (“Very satisfied” = 1) vs. Sum of (“Very satisfied + somewhat satisfied” =1)
Unless you have a really good reason not to (e.g., policy importance of the “rare” cases)
1
SPS 580 Lecture 5
II.
eta Bivariate regression F Rsq
notes
INTRO TO BIVARIATE REGRESSION
Theory . . . Income  Attitude on Neighborhood Economic Future
A. CODE YOUR VARIABLES
X (0,1)  Y(int)
INCOME RECODED TO (0,1) DICHOTOMY
variable labels incomeadj 'HH income percentile in year survey was conducted'.
value labels incomeadj 0 '<10%' 1 '10-25%' 2 '25-33%' 3 '33-50%' 4 '50-66%' 5 '66-75%' 6 '75-90%' 7 '90%+'.
missing values incomeadj (99).
RECODE incomeadj (0 thru 3=0) (4 thru 9=1) (ELSE=9) INTO income50pct.
VARIABLE LABELS income50pct 'above or below median'.
value labels income50pct 0 'below median' 1 'above median'.
missing values income50pct (9).
 ALWAYS CODE DICHOTOMIES (0,1), whether they are X (indep) or Y (dep)
 THERE ARE NO GOOD REASONS NOT TO DO THIS
B. DESCRIBE DEPENDENT VARIABLE
Neighborhood pessimism scale
Low
0
High
3
Mean
0.52
 PQ table explains the dependent variable
Variance
0.7280
 Mean/range = Index of skew
C. PERCENTAGE TABLE  NOT USUALLY DONE, BUT CRUCIAL TO DO ONCE IN
ORDER TO UNDERSTAND THIS STUFF
Household
Income
Conditional distributions of %s
Score on Neighborhood Pessimism Scale
0
1
2
3
total
0 Below median
58%
23%
13%
6%
100%
1 Above median
75%
14%
7%
3%
100%
Chi Square (3) = 198 p < .05
Phi = .188
Is there a (sig) difference?
Where’s the action?
D. TABLE OF MEANS
Score on Neighborhood Pessimism Scale
Conditional means, variances
Household Income
Mean
Variance
SEM
N
0 Below median
.681
.8551
.01788
2,675
1 Above median
.386
.5834
.01418
2,901
.528
.7354
.01148
5,576
Total
2
Difference = .386 -.681 = - .295
SE(Diff)
= .023
T (like chi sq)
= - 12.9 p <.05
Eta (like phi)
= .172
Is it sig? Is it large, medium or small?
SPS 580 Lecture 5
eta Bivariate regression F Rsq
notes
E. GRAPH THE CONDITIONAL MEANS
Total Variance Var Y = .7354
Conditional Variances . . .
Var Y | X=0 = .8551
Var Y | X=0 = .5834
Avg Conditional Variance = .7138
Score on Neighborhood
Pessimism Scale
.68
.80
.60
Eta Squared =
1 – Avg Conditional / Total Variance = .03
Measures how much variance in Y is
“explained” by X ( = 3%)
.39
.40
.20
.00
0 Below median income
1 Above median income
Eta = SQRT(Eta^2) . . . like phi, a coefficient
that indexes magnitude of correlation
III.
AND NOW, HERE’S JOHNNY . . . REGRESSION
A. WHAT IT’S ABOUT
Regression is based on the idea of a equation to predict the average score of Y as a
function of X
Predicted average (Y) = a + B(x)
 how the math works
Y = a + B(x)
 usually expressed this way
a = the intercept, the predicted average on Y when X = 0
B = the slope = how much the predicted average on Y changes when X goes up by 1
So in our example a = .681 = predicted average on Y for below median income (when X = 0)
And B = -.295 = how much the predicted average on Y changes when X goes up by 1
REGRESSION EQUATION . . . Y = .681 -.295(x)
Solve for predicted values of Y . . . below median $ . . . Y = .681 -.295*0 = .681
above median $ . . . Y = .681 -.295*1 = .386
B. GET THE COMPUTER TO DO IT
ANALYZE REGRESSION LINEAR Dependent nbhdscale Independent income50pct
OPTIONS exclude cases pairwise STATISTICS Descriptives CONTINUE OK
Coefficients
Model
1
Unstandardized
Coefficients
B
Std. Error
(Constant)
.669
.016
income50pct above or below
median
-.293
.023
Intercept = a
Slope = B
Standardized
Coefficients
Beta
-.172
Y = .669 -.293(x)
t
Sig.
42.280
.000
-13.030
.000
T-test for the slope = -13
3
The output for the
regression equation
is in the section on
Coefficients
p < .05
SPS 580 Lecture 5
eta Bivariate regression F Rsq
notes
Model Summary
R
R Square
Adjusted R
Square
Std. Error of
the Estimate
.172
.030
.029
.84057
The output for explained variance is in the
Model Summary
R^2 = explained variance = .029 (.030 unadjusted)
It measures the “goodness of fit” of the linear regression model to the observed data
It is the same as ETA^2 when X is a dichotomy, in general that will not be the case
C. SUMMARIZE THE RESULTS
Impact on Neighborhood Pessimism
Slope
T-test significant?
R Sq
-.293
-13.030
yes
.029
Predictor (coding)
Income (0,1)
IV.
 PQ
MAKE X AN INTERVAL VARIABLE
X (int 0,3)  Y(int) ALWAYS CODE INTERVAL VARS STARTING with (0)
A. Look over the pattern
Score on Neighborhood Pessimism Scale
Household
Income
0
1
2
3
total
0 Lowest qtr
52%
25%
15%
7%
100%
1 Second qtr
64%
20%
12%
5%
100%
2 Third qtr
70%
17%
8%
4%
100%
4 Top qtr
80%
11%
6%
3%
100%
Whenever feasible, look at the
XTAB to see what’s going on
ChiSq(9) = 270
Phi = .22
p < .05
Is it sig, where’s the action?
B. Table of Means
Neighborhood
Pessimism
Score
0 Lowest Income Quarter
0.78
1 Second qtr
0.57
2 Third qtr
0.46
4 Top qtr
0.31
Total
 3+ means . . . can’t do a simple T-test
 ETA^2 = .040 (4% explained variance)
 ETA = .200
0.53
4
SPS 580 Lecture 5
eta Bivariate regression F Rsq
notes
C. Run the Regression
Coefficientsa
Model
Standardized
Unstandardized Coefficients
B
1
(Constant)
incomeQUARTER quarter
Coefficients
Std. Error
.752
.019
-.152
.010
Beta
t
-.199
Sig.
40.087
.000
-15.150
.000
a. Dependent Variable: nbhdscale
D. Write the equation, solve for Y as a function of X
Predicted
Y
x=
x=
x=
x=
0
1
2
3
0.75
0.60
0.45
0.30
Y = .752 -.152 (x)
 linear pattern of predicted means (LINEAR Regression)
E. Plot the Observed and Predicted Means
0.80
Impact of Household Income on
Neighborhood Pessimism Score
0.60
0.40
Observed
Predicted
0.20
0.00
0 Lowest Income
Quarter
1 Second qtr
2 Third qtr
4 Top qtr
F. SUMMARIZE THE RESULTS
Predictor
Income Quarter (0,3)
Impact on Neighborhood Pessimism
Slope T-test significant?
R Sq
Eta Sq
-.152 -15.1
yes
.039
.040
 PQ
Difference between R^2 and Eta^2 shows the difference between a LINEAR prediction model
and a CURVILINEAR prediction model
Papers had several curvilinear patterns Age  HH electronics, financial service use, social
taxes, safety of parks, honesty of charities  Topic of curves is for the future
5
SPS 580 Lecture 5
V.
eta Bivariate regression F Rsq
notes
MAKE Y A DICHOTOMOUS VARIABLE
A. Advantageous choice of Y recode to highlight data pattern (causal reln) and to simplify
the analysis and presentation
Score on Neighborhood Pessimism Scale
Household
Income
0
1
2
3
total
0 Lowest qtr
52%
25%
15%
7%
100%
1 Second qtr
64%
20%
12%
5%
100%
2 Third qtr
70%
17%
8%
4%
100%
4 Top qtr
80%
11%
6%
3%
100%
Recode Y
(0 vs 1+) ………...
(0-1 vs. 2+)………..………….
(0-2 vs. 3 )………..………………………
CONSIDERATIONS ON THE OPTIMAL RECODE
Highlight the data pattern, Maximize variance, Focus on policy-relevant group (if there is one)
B. Look over the pattern, Table of Means
When Y is (0,1) the proportion and the mean are the same thing, so you only need one table
Percent
pessimistic
0 Lowest Income Quarter
48%
1 Second qtr
36%
2 Third qtr
30%
4 Top qtr
20%
Total
Chi Sq(3) = 264 p < .05
Phi = .217
Eta = .217
Eta^2 = .047
33%
C. Graph the data
Percent pessimistic
60%
48%
36%
40%
30%
20%
20%
0%
0 Lowest
Income
Quarter
1 Second
qtr
2 Third qtr
4 Top qtr
6
SPS 580 Lecture 5
eta Bivariate regression F Rsq
notes
D. Perform the Regression
Coefficientsa
Model
1
Unstandardized
Coefficients
B
Std.
Error
(Constant)
.468
.010
incomeQUARTER quarter
-.091
.005
Standardized
Coefficients
Beta
-.217
t
Sig.
45.413
.000
-16.557
.000
E. Write the equation, solve for Y as a function of X
Predicted Y
x=
x=
x=
x=
0
1
2
3
0.47
0.38
0.29
0.19
Y = .468 - .091 (x)
 linear pattern of Exp(Y) = a + B(x)
F. Compare Observed Y with Predicted Y
Examination of Outliers, Residuals
Income Observed Predicted Residual
Difference
Y
Qtr
Y
x= 0
0.48
0.47
0.01
x= 1
0.36
0.38
-0.01
x= 2
0.30
0.29
0.01
x= 3
0.20
0.19
0.00
 Residual means Observed minus Predicted
 pattern of residuals tells you if there is
curvilinearity (along with inspection of graph, and
(ETA^2 – R^2) )
G. SUMMARIZE THE RESULTS
Predictor
Income Quarter (0,3)
Impact on Neighborhood Pessimism
Slope T-test significant?
R Sq
Eta Sq
-.091 -16.6
yes
.047
.047
7
 PQ
SPS 580 Lecture 5
eta Bivariate regression F Rsq
notes
ASSIGNMENT 5:
A. Develop a theory of interest relating two interval-level measures X and Y
1. Create an interval dependent variable (Y) of interest. Recode as necessary to deal with
outliers, skewness. Show the PQ percentage BAR GRAPH for the scale.
2. Create an interval independent variable (X) of interest. Recode as necessary to deal with
outliers, skewness. Show the PQ percentage BAR GRAPH for the scale.
B. Analyze the data according to three different data analysis situations
1. X and Y both interval
2. X dichotomous, Y interval
3. X dichotomous, Y dichotomous
C. For each data analysis situation:
1. Show the pattern, show the Table of Means
2. Graph the data
3. Perform the regression – show the result
4. Write the equation, solve for Y as a function of X – show illustrative results
5. Compare Observed Y with Predicted Y – comment on curvilinearity, outliers
6. Summarize the results in a table
7. State your conclusion
D. In each section, use enough English so I can follow what you are doing without having to
memorize computer words.
8
Download