SS11_15.1

advertisement
Testing statistical significance of
differences between coefficients
Jane E. Miller, PhD
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Overview
• Review: Inferential statistical tests for coefficients
• Testing statistical significance of differences
– Between coefficients in the same model
– Between coefficients in independent models
• Standard error of the difference
• Presenting results of tests of differences between
coefficients
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Review: Statistical significance of βs
• In the standard output from a regression model,
inferential statistics provide the information to test
whether the coefficient on an independent variable
is statistically significantly different from zero
• For continuous independent variables
– Whether the marginal effect of a one-unit increase in that
IV is different from zero
• For categorical independent variables
– Whether difference between the mean of the DV for the
specified group and the reference category is different
from zero
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Estimated coefficients from an OLS model of birth
weight in grams
Coefficient
Intercept
3,317.8**
Mother’s age at child’s birth (years)
10.7**
Mother’s education
< High school (<HS)
–55.5**
= High school (=HS)
–53.9**
(> High school; >HS)
Standard
error
25.1
1.2
19.3
14.8
** denotes p < 0.01
Reference category in parenthesis
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Example: β on a continuous IV
• OLS model of birth weight in grams includes
mother’s age in years as an independent variable
• βmother’s age = 10.7 with a standard error (s.e.) of 1.2, p
< 0.001
– Thus we reject the null hypothesis
H0: βmother’s age = 0
• We conclude that the slope of the association
between mother’s age and birth weight is statistically
significantly different from zero at p < 0.001
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Example: β on a categorical IV
• The birth weight model includes an ordinal measure
of mother’s educational attainment
•
> HS is the reference category
• β<HS= –55.5 with a standard error (s.e.) of 19.3, p <
0.001
– Thus we reject the null hypothesis
H0: β<HS = 0
H0: mean birth weight for < HS = mean birth weight for > HS
• Mean birth weight for infants born to mothers with
< high school education is statistically significantly
different from those born to mothers with > HS
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Testing other hypotheses
• For some research questions, you might need to test
a hypothesis in addition to i = 0. E.g., whether
– Two s in a given model are statistically significantly
different from one another
• E.g., <HS = =HS
– The size and statistical significance of a  changes across
models when additional covariates such as confounders or
mediators are included in the model
• E.g., H0: non-Hispanic black (I) = non-Hispanic black (II)
– The effect of a covariate differs across models estimated
for independent subgroups (stratified models)
• E.g., H0: <HS is the same for males as for females
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Testing statistical significance of
differences between coefficients
• To formally test statistical significance of differences
between coefficients, e.g., H0: βj = βk
– Divide the difference between the estimated coefficients
(j − i) by the standard error of the difference to obtain
the test statistic
– Compare the calculated test statistic against the pertinent
critical value with one degree of freedom
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Standard error of the difference
• The standard error of the difference is calculated:
Square root
√[var(j) + (2 × cov(j, k)) + var(k) ]
– var(j) and var(k) are the variances of j and k
– cov(j, k) is the covariance between j and k
• When j and k are from different models
– Considered statistically independent of one another
• cov(j, k) = 0
• When j and k are from within one regression model
– Not independent of one another
• cov(j, k) ≠ 0
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Testing differences of s from one model
• When j and k are from the same model, must
include the covariance in the calculation of the
standard error of the difference
√[var(j) + (2 × cov(j, k)) + var(k) ]
• The complete variance-covariance matrix for a
regression can be requested as part of the output
• The variance of each coefficient can be calculated
from its standard error (s.e.)
var(j) = [s.e.(j)]2
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Example: Testing whether β<HS = β=HS
• From the table, <HS = –55.5 and =HS = –53.9
• The difference between β<HS and β=HS is calculated
β<HS – β=HS = –55.5 –(–53.9) = 1.6
• For that model,
• var(<HS) = 370.9
• var(=HS) = 218.8
• cov(<HS, =HS) = 137.8
• Plugging those values into the formula for the standard error
of the difference yields
= √[370.9 + (2 × 137.8) + 218.8]
= 17.72
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Example, cont.: Testing β<HS = β=HS
• To calculate the test statistic, divide the difference
between <HS and =HS by the standard error of the
difference:
(β<HS – β=HS)/s.e. (β<HS – β=HS)
= 1.6/17.7 = 0.09
• 0.09 < 1.96 (the critical value of 1.96 for a t-test with
∞ degrees of freedom at p < 0.05)
• Cannot reject the null hypothesis that β<HS = β=HS
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
TEST statement
Neither is the reference category
• Many software packages can do these calculations
for you
• To test other contrasts among categories, request the
test statistic for equality of coefficients for pairs of
coefficients: H0 : βj = βk
– E.g., to test whether predicted birth weight is statistically
significantly different for infants born to mothers with < HS
than for those with = HS
• Specify “TEST ‘<HS’ = ‘=HS’” in your SAS syntax
• Output for H0: β<HS = β=HS reports an F-statistic of 0.01
with a p-value of 0.93
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Testing differences of s from
independent models
• When j and k are from different models they can be
assumed to be independent of one another
cov(j, k) = 0
• Thus the formula for the standard error of the difference
√[var(j) + (2 × cov(j, k)) + var(k) ]
simplifies to
√[var(j) + var(k) ]
• Reminder: var(j ) and var(k) can be calculated from the
standard error reported in the regression output
var(j) = [s.e.(j)]2
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Example: Change in βs across nested models
• In nested models I and II, s on non-Hispanic black are
NHB(I) = –244.5 , s.e. = 16.7
NHB (II) = –147.2 , s.e. = 17.6
• The change in β between models I and II:
–244.5 –(–147.2 ) = 97.3
• Plugging the standard errors for NHB(I) and NHB(II) into
the formula for standard error of the difference yields
(s.e. difference) = √ [(16.7)2 + (17.6)2]
= 24.3
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Change in βs across nested models, cont.
• The t statistic for the difference in β is calculated:
(difference in β)/ s.e.(difference in β)
• Plugging in the values from the previous slide:
97.3 ÷ 24.3 = 4.01
• 4.01 exceeds the critical value of 2.56 for p < 0.01, so
we conclude that the change in NHB between models
I and II is statistically significant at p < 0.01
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Tables to present multivariate results
• In the table of multivariate statistics, for each
independent variable in the model, present
– The estimated coefficient ()
– The standard error
• See chapters 5 and 11 of Writing about Multivariate
Analysis, 2nd Edition for guidelines and examples of
multivariate tables
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Prose to present results of differences
between coefficients
• Introduce the substantive reason behind the test for
difference between s, given your
– Research question
– Variables (categories, units)
• Report and interpret the results of the formal
statistical test of difference between coefficients
– Test statistic
– Accompanying degrees of freedom
• Explain the conclusions you draw from that test
about specification of your model
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Poor presentation:
Results of test differences between s
• “From table 15.3, Model III we have <HS = –55.5 and =HS = –53.9,
so the difference between β<HS and β=HS is β<HS – β=HS =
–55.5 – (–53.9) = 1.6. For that model, var(<HS) = 370.9, var(=HS) =
218.8, and cov(<HS, =HS) = 137.8. Plugging those values into the
formula for the standard error of the difference yields √[370.9 + (2 ×
137.8) + 218.8] = 17.7. To calculate the test statistic, divide the
difference between <HS and =HS by the standard error of the
difference: (β<HS – β=HS)/s.e. (β<HS – β=HS) = 1.6/17.7 = 0.09, which is
less than the critical value of 1.96 for a t-test with ∞ degrees of
freedom at p < 0.05). Thus we cannot reject the null hypothesis that
β<HS = β=HS.”
– Except for an assignment in a course where you must demonstrate that you
know this logic, skip the statistics lesson to your readers!
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Better presentation:
Results of test differences between s
• “The 1.6 unit (gram) birth weight difference between the
estimated coefficients for ‘less than high school’ and ‘high
school graduate’ in Model III is not statistically significant
(F-statistic for the test of difference = 0.01; p = 0.93).”
– Mentions the
•
•
•
•
•
•
•
Dependent variable
Independent variable (educational attainment)
Units or categories
Purpose of the test for a change in NHB across nested models
Magnitude
Statistical significance
Direction (not mentioned because trivially small)
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Example presentation:
Change in  across nested models
• “As shown in table 15.3, the coefficient on non-Hispanic black
decreases 93 points (grams), from –244.5 in model I to –147.2
in model II (t = 4.01; p < 0.01). Thus, the addition of controls
for socioeconomic characteristics is associated with a large,
statistically significant decrease in the birth weight deficit for
non-Hispanic black compared to non-Hispanic white infants.”
– Mentions the
•
•
•
•
•
•
Dependent variable
Independent variables and their units or categories
Purpose of the test for a change in NHB across nested models
Direction
Magnitude
Statistical significance
Summary
• To test hypotheses other than H0: βi = 0, calculate a
test statistic from the difference in coefficients and
the standard error of the difference
• Compare that test statistic against the critical value
• βs from different models are considered statistically
independent of one another, so the covariance is not
needed to compute standard error of the difference
• E.g., nested models, stratified models
• βs from the same model are not statistically
independent of one another, so the covariance is
needed to compute standard error of the difference
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Summary, cont.
• If coefficients are not statistically significantly
different from one another, the model specification
often can be simplified by combining terms
• Then test effect of simplified specification on overall
fit using model GOF statistics
• Present results of difference between coefficients
– Use a combination of tables and prose
– Describe conclusions, not process
– Relate to topic at hand
Suggested resources
• Miller, J. E. 2013. The Chicago Guide to Writing about
Multivariate Analysis, 2nd Edition. University of
Chicago Press. Chapters 11 and 15.
• Freedman, David, Robert Pisani, and Roger Purves.
2007. Statistics, 4th Edition. New York: W. W. Norton.
• Gujarati, Damodar N. 2002. Basic Econometrics, 4th
Edition. New York: McGraw-Hill/Irwin.
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Suggested online resources
• Podcasts on
– Interpreting coefficients from OLS and logit
models
– Comparing overall goodness of fit across models
– Testing whether a multivariate specification can
be simplified
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Suggested practice exercises
• Study guide to The Chicago Guide to Writing
about Multivariate Analysis, 2nd Edition.
– Questions #2, 3, and 5 in the problem set for
chapter 11
– Suggested course extensions for chapter 11
• “Reviewing” exercise #2
• “Applying statistics and writing” exercise #3
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Contact information
Jane E. Miller, PhD
jmiller@ifh.rutgers.edu
Online materials available at
http://press.uchicago.edu/books/miller/multivariate/index.html
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Download