coefficient

advertisement
Linear Regression: Making Sense of
Regression Results

Interpreting Stata regression output
Coefficients for independent variables
 Fit of the regression: R Square


Statistical significance


How to reject the null hypothesis
Multivariate regressions
College graduation rates
 Ethnicity and voting

SPSS Output – We’ll Use Stata –
Benefit in Knowing Two Packages
100
80
Slope or
“coefficient”
60
Graduation Rate
40
How tight is
the fit?
Y-intercept
or “constant”
20
0
Rsq = 0.3454
0
200
400
600
Average SAT Score
800
1000
1200
1400
1600
Interpreting regression output

Regression output typically includes two
key tables for interpreting your results:

A “Coefficients” table that contains the yintercept (or “constant”) of the regression, a
coefficient for every independent variable,
and the standard error of that coefficient.

A “Model Summary” table that gives you
information on the fit of your regression.
Interpreting SPSS (another statistical
package) regression: Coefficients – 1
Coefficientsa
Model
1
(Cons tant)
Average
SAT Score
Uns tandardized
Coefficients
Std.
B
Error
4.236
7.048
5.88E-02
.007
Standardized
Coefficients
Beta
.588
t
.601
Sig.
.549
8.778
.000
a. Dependent Variable: Graduation Rate
• The y-intercept is 4.2% with a standard error of 7.0%
• The coefficient for SAT Scores is 0.059%, with a
standard error of 0.007%. Standardized coefficients
discussed later.
Interpreting regression output:
Coefficients - 2

The y-intercept or constant is the
predicted value of the dependent variable
when the independent variable takes on
the value of zero.
This basic model predicts that when a
college admits a class of students who
averaged zero on their SAT, 4.2% of them
will graduate.
 The constant is not the most helpful statistic.

Interpreting regression output:
Coefficients - 3

The coefficient of an independent variable
is the predicted change in the dependent
variable that results from a one unit
increase in the independent variable.
A college with students whose SAT scores are
one point higher on average will have a
graduation rate that is 0.059% higher.
 Increasing SAT scores by 200 points leads to a
(200)(0.059%) = 11.8% rise in graduation rates

Interpreting regression output: Fit of
the Regression
Model Summary
Model
1
R
R Square
.588 a
.345
Adjus ted
R Square
.341
Std. Error of
the Es timate
12.45%
a. Predictors : (Constant), Average SAT Score
The R Square measures how closely a regression line
fits the data in a scatterplot.
• It can range from zero (no explanatory power) to one
(perfect prediction).
• An R Square of 0.345 means that differences in SAT
scores can explain 35% of the variation in college
graduation rates. Key sentence for quizzes!
Statistical Significance - 1

What would the null hypothesis look like
in a scatterplot?

If the independent variable has no effect on
the dependent variable, the scatterplot
should look random, the regression line
should be flat, and its slope should be zero.

Null hypothesis: The regression coefficient
for an independent variable equals zero.
Statistical Significance - 2

Our formal test of statistical significance
asks whether we can be SURE that a
regression coefficient DIFFERS from zero.
The “standard error” is the standard deviation
of the sample distribution.
 If a coefficient is more than two standard
errors away from zero, we can reject the null
hypothesis (that it equals zero).

Statistical Significance - 3

So, if a coefficient is more than TWICE the
size of its standard error, we REJECT the
NULL hypothesis with 95% confidence.
 This works whether the coefficient is
negative or positive.
 The coefficient/standard error ratio is called
the “test statistic” or “t-stat.”
 A t-stat bigger than 2 or less than -2
indicates at statistically significant effect
Statistical Significance - 4
Regression of Tax on Cons,
Party and Stinc in Stata
Source |
SS
df
MS
-------------+-----------------------------Model | 54886.5757
3 18295.5252
Residual | 26840.2643
96 279.586087
-------------+-----------------------------Total |
81726.84
99 825.523636
Number of obs =
F( 3,
96) =
Prob > F
=
R-squared
=
Adj R-squared =
Root MSE
=
100
65.44
0.0000
0.6716
0.6613
16.721
-----------------------------------------------------------------------------tax |
Coef.
Std. Err.
t
P>|t|
Beta
-------------+---------------------------------------------------------------cons | -.64472
.07560
-8.53
0.000
-.7010575
party | 11.20792
4.67533
2.40
0.018
.1902963
stinc | -.56008
1.28316
-0.44
0.663
-.0297112
_cons | 67.38277
15.11393
4.46
0.000
.
------------------------------------------------------------------------------
For which independent variables would we
reject the null hypothesis? Why?
Visualizing a t ratio - 1
Which of the next two slides depicts a
higher t ratio?
Visualizing a t ratio - 2
Visualizing a t ratio - 3
Multivariate Regression - 1

A “multivariate regression” uses more than
one independent variable (or confound) to
explain variation in a dependent variable.

The coefficient for each independent variable
reports its effect on the DV, holding constant all
of the other IVs in the regression.
Multivariate Regression - 2
Year of
Founding
SAT Scores
Tuition
Student/Faculty
Ratio
Graduation
Rates
Multivariate Regression - 3
Coefficientsa
Model
1
(Cons tant)
Year s chool was
founded
Average SAT Score
In-s tate Tuition
Student/faculty ratio
Uns tandardized
Coefficients
Std.
B
Error
59.187
47.203
Standardized
Coefficients
Beta
t
1.254
Sig.
.212
-2.1E-02
.023
-.072
-.917
.361
4.2E-02
8.4E-04
-.206
.010
.000
.329
.410
.208
-.054
4.224
2.109
-.626
.000
.037
.533
a. Dependent Variable: Graduation Rate
Multivariate Regression - 4

Holding all other factors constant, a 200 point
increase in SAT scores leads to a predicted
(200)(0.042) = 8.4% increase in the graduation
rate, and this effect is statistically significant.

Controlling for other factors, a college that is
100 years younger should have a graduation
rate that is (100)(-0.021) = 2.1% lower, but this
effect is NOT significantly different from zero.
Multiple Regression:
Comparative Politics – Stata - 1
Let’s examine the impact of government
ideology on economic growth in 18
wealthy democracies (Western Europe,
the United States, Canada, Japan,
Australia and New Zealand) annually
over the 1961-1994 period.
Comparative Politics - 2
Variable List:
growthpc – annual growth of per capita
(i.e., per person) gross domestic product
govcons – strength of the conservative
party in the national government
left – strength of the left party in the
national government
Comparative Politics - 3
gdppc – per capita gross domestic product
unem – unemployment rate
Comparative Politics - 4
Source |
SS
df
MS
-------------+-----------------------------Model | 272.295407
4 68.0738517
Residual | 1841.26412
448 4.10996456
-------------+-----------------------------Total | 2113.55953
452 4.67601666
Number of obs =
F( 4,
448) =
Prob > F
=
R-squared
=
Adj R-squared =
Root MSE
=
453
16.56
0.0000
0.1288
0.1211
2.0273
-----------------------------------------------------------------------------growthpc |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------govcons | -.168093
.0380607
-4.42
0.000
-.2428933
-.0932942
left |
.001841
.0034541
0.53
0.594
-.0049468
.0086298
gdppc | -.000157
.0000585
-2.70
0.007
-.0002725
-.0000428
unem | -.086520
.0458576
-1.89
0.060
-.176643
.0036023
_cons | 7.501013
.7285216
10.30
0.000
6.069269
8.932757
-------------+----------------------------------------------------------------
What do these results indicate?
Multicollinearity Check
vif
Variable |
VIF
1/VIF
-------------+---------------------govcons |
1.37
0.730762
unem |
1.31
0.763241
gdppc |
1.29
0.776446
left |
1.20
0.834291
-------------+---------------------Mean VIF |
1.29
Low multicollinearity – highest is govcons
(27% of the variance explained by the
other independent variables:
1 - .73 = .27 – thus “low”)
Nonlinear Models - 1
While many/most variable relationships in
political science are reasonably well
approximated by the linear relationships
shown on the next slide, some are not.
Nonlinear Models - 2
The next slide shows a negative nonlinear
relationship between OSHA
expenditures and the workplace injury
rate. What theory would lead us to think
that: (1) the relationship between OSHA
expenditures and the workplace injury
rate would be negative; (2) that the
relationship would be nonlinear? What
form should the nonlinearity take?
Nonlinear Models - 3
Nonlinear Models - 4
DON’T WORRY ABOUT THE MATH!
Since the rate of change decreases (i.e., the
injury rate decreases but at a slower rate for
each additional dollar spent on OSHA
inspections), we can estimate a linear
relationship by converting the OSHA budget to
logarithms. Thus, an OSHA budget of 10 (i.e.,
$10,000,000) is read as 2.3 (i.e., base “e” =
2.71728 and 2.718282.3 = 10).
Nonlinear Models - 5
The next slide shows the relationship
between economic development and
political violence. What form should
such a relationship take? Should we
expect the relationship to change
direction (i.e., from negative to positive
or vice versa)? Why? How would you
measure the variables?
Nonlinear Models - 6
Nonlinear Models - 7
The next several slides examine nonlinear
models from the comparative politics literature
on political violence. The dependent variable
is the death rate in a nation from political
violence or violent acts (e.g., riots).
Nonlinear Models - 8
Nonlinear Models - 9
Nonlinear Models - 10
Nonlinear Models - 11
The next slide shows a graph in which the
dependent variable (Y axis) is the percentage
of elected county officials who are AfricanAmerican and the independent variable (X
axis) is the percentage of the county voters
who are African-American. What would you
expect the graph to look like? How many
“changes of direction” (positive to negative or
vice versa) in the relationship would you
expect?
Nonlinear Models - 12
North Carolina
Source |
SS
df
MS
-------------+-----------------------------Model | 8422.69127
4 2105.67282
Residual |
7404.1454
295
25.098798
-------------+-----------------------------Total | 15826.8367
299 52.9325641
Number of obs =
F( 4,
295)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
300
83.90
0.0000
0.5322
0.5258
5.0099
-----------------------------------------------------------------------------blktot |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------blkreg |
.9915165
.1630062
6.08
0.000
.670714
1.312319
blkregsq |
-.037464
.0071142
-5.27
0.000
-.051465
-.023463
blkregcub |
.0005588
.00009
6.21
0.000
.0003817
.0007359
wall | -.1548252
.0395056
-3.92
0.000
-.2325737
-.0770767
_cons |
1.051
.9752407
1.08
0.282
-.868311
2.970311
------------------------------------------------------------------------------
Interaction Terms - 1
If our theory indicates that the impact of
one independent variable on the
dependent variable changes as the level
of ANOTHER independent variable
changes, we need an interaction term.
We simply multiply the scores on the two
independent variables and create a new
independent variable.
Interaction Terms - 2
Interaction Terms - 3
The Impact of Outliers
The next two slides show the impact of
outlier (i.e., extreme) data. The
argument that a lower corporate tax rate
will actually raise more revenue is based
on this conundrum. Spotting outliers is
one of the reasons graphical analysis is
useful. We sometimes re-run analyses
removing an extreme score to see how
fragile the initial results are.
Outlier Omitted
Causal Models – Presidents and
the Economy - 1
20th Percentile (Dep. Variable: Growth Rate)
Democratic President
2.32 (.80)
Oil Prices (% lagged)
-.032 (.016)
Labor Force Participation
4.66 (1.44)
Lagged Growth
-.191 (.084)
Linear Trend
-12.84 (5.88)
Quadratic Trend
9.68 (5.75)
Intercept
2.68 (1.26)
R - Squared
.41
Causal Models – Presidents and
the Economy - 2
Impact of Democratic President across
Income Groups:
20th Percentile: 2.32 (.80)
40th Percentile: 1.60 (.56)
60th Percentile: 1.53 (.52)
80th Percentile: 1.23 (.51)
95th Percentile: .50 (.64)
Causal Models – Presidents and
the Economy - 3
20th Percentile (Dep. Variable: Growth Rate)
Democratic President
.51 (.64)
Unemployment (%)
-.849 (.307)
Inflation (%)
-.134 (.127)
GNP Growth (%)
.798 (.144)
Oil Prices (% lagged)
-.005 (.013)
Why are the results different? Does the
partisanship of the President matter? (YES!)
Regression – Presidents and the
Economy - 4
income
Democratic >>>> unemployment >>growth
Presidential >>>> inflation >>>>>> rate
Adm.
>>>>>GNP growth>>>> 20th
percentile
Download