Session 4

advertisement
Education 795 Class Notes
P-Values, Partial Correlation,
Multi-Collinearity
Note set 4
Today’s Agenda
Announcements (ours and yours)
Q/A?
Leveraging what we already know
Partial Correlation and Multi-Collinearity
P-Values
“p-value refers to the probability of the
evidence having arisen as a result of
sampling error given that the null
hypothesis is true” (Pedhazur &
Pedhazure, 1991)
What is inherently wrong the p-values?
Why do we use them?
P-Values
“Even though I am very critical of statistical
inference… I shall probably continue to pay
homage to “tests of significance” in the
papers I submit to psychological journals. My
rationale for this admitted hypocrisy is
straightforward: until the rules of the science
game are changed, one must abide by at
least some of the old rules, or drop out of the
game” (Mahoney, 1976, p. xiii)
What to do?
“Perhaps p values are like mosquitos.
They have an evolutionary niche
somewhere and no amount of
scratching, swatting, or spraying will
dislodge them” (Campbell, 1982, p 698)
Statistical Significance vs.
Practical Significance
We should refrain from what Tukey calls
“statistical sanctification.” Concern with
practical significance is addressed
through effect sizes or relational
magnitudes (betas in regression).
“A difference is a difference only if it
makes a difference” (Huff, 1954, p. 58)
Introduction to Effect Size
Effect sizes imply strength of
meaningfulness or importance
General Rule set forth by Cohen (1988)
for small, medium, large ES
We will address how effect sizes are
computed later in the course
Transition Back to
Multiple Regression
1. Multiple predictors typically yield better
technical solutions (e.g., higher R2)
2. Multiple predictors provide opportunities to
test more realistic models (e.g., why is
nothing as simple as it should be?)
3. Multiple regression models allow for an
examination of more complex research
hypotheses than is possible with simple
regression / correlation approaches
Regression
Raw score depiction:
where each b:
is the unique and independent contribution of that
predictor to the model
for quantitative IVs, the expected direction and
amount of change in the DV for each unit change in
the IV, holding all other IVs constant
For dichotomous IVs, the direction and amount of
group mean difference on DV, holding all other IVs
constant
Revisit b’s
Example:
Dependent Variable: Promote Racial Understanding
Independent Variable: Sex, Race
bsex = rsex,promote if sex and race are not correlated.
These are population based estimates and they are
“effect sizes” because we can compare relative
strength of predictors in the model
In the Venn diagram on the following slide, note X1
and X2 are not correlated but X2 and X3 are
Venn Diagram Depiction
Correlation
Regression
Coefficients
Warning
Pedhazur believes that the topics of
partial correlations and semi-partial
correlations can be confusing and lead
to misinterpretations of regression
coefficients. Why talk about them?
Awareness and enough knowledge to
evaluate research where partials are used
Partial Correlations
A variation on the idea of residualization
(removal of the predictable part of y from y)
First-order partial correlations:
correlation of variable 1 and 2 partialling
variable 3 from 1 and 2
Plug and Chug
r
Quiz
Exam
Speed
Motiv
Quiz
1.00
.40
.35
.25
Exam
Speed Motiv
1.00
.45
.30
1.00
.15
1. What is the correlation between
quiz and exam score, controlling
for test taking speed?
2. What is the correlation between
exam score and motivation,
controlling for test taking speed?
1.00
Semi-Partial Correlations
r1(2.3)=correlation of variables 1 and 2 after
having partialed variable 3 only from
variable 2. (semi-partial)
VS
r12.3=correlation of variables 1 and 2 after
having partialed variable 3 from both
variable 1 and variable 2 (partial)
Before Jumping Into Regression
Examine the data using common-sense
(e.g., are the data appropriate for
producing interpretable correlation
coefficients?) as well as standard
diagnostic procedures
Review the r among the predictors for
collinearity problems
Multicollinearity
Multicollinearity refers to correlations among the independent
variables only
Multicollinearity is measured by the tolerance statistic, defined
as 1 – R2 predicting each predictor using all other predictors
(values close to 1 are better, values close to 0 are bad)
Excessive collinearity (even singularity – perfect correlation
between two or more IVs) suggests that predictors have
extensive overlaps, and we may need to be selective in picking
predictors or combining them (through factor analytic
techniques)
Dangers
Multicollinearity has adverse effects on
regression analysis
High multicollinearity leads to a
reduction in the magnitude of the b’s
High multicollinearity leads to inflated
se’s, reducing the t-ratios for the
coefficients
Solutions
Be selective in choosing variables that
are related
Combine like variables into an index
using scales or ‘factor analysis’ which
we will talk about soon
Suppressors
When a partial correlation is larger than the
original r, it is considered to be the result of a
suppressor effect
Suppressor variables effectively mask
(suppress) the relationship between other
variables
This effect occurs when there is an
unbalanced mix of +/- correlations between
the DV and the IVs
Project Activity
Dataset: Chose a dataset and run a multiple regression
Dependent variable: SATC=SATM+SATV
Independent variables: sex, family income,
mother’s education and father’s education
Use syntax to get the tolerance statistic
Rerun the regression summing mothers and fathers
education into one variable. Compare the tolerance statistic
for mothers and fathers education with the summed index.
For Next Week
Read Pedhazur Ch 10 p211-216
Read Pedhazur Ch 14 p304-310
Read Pedhazur Ch 19 p464-466
Read Pedhazur Ch 21 p545-558, p567579
Download