Exploring Multivariate Analysis

advertisement
Beyond Bivariate: Exploring
Multivariate Analysis
3 Topics Covered
1. Logic of introducing a third variable
2. Multiple linear regression: Which independent
(predictor) variables are significantly related to
the dependent (outcome) variable?
3. Logistic regression: Binary outcome variable
A Focal Relationship
Residential mobility and school achievement
This is a negative or inverse relationship:
Higher residential mobility  Low achievement
WHY?
The 0-Order Bivariate Relationship
We are going to call our initial bivariate relationship
the 0-order relationship:
Residential mobility  School achievement
Spurious Relationship/Explanation
Could there be variables that are associated with
high levels of residential mobility and with low
school achievement, creating an apparent but
spurious relationship between residential mobility
and achievement — thus EXPLAINING AWAY the
initial bivariate relationship?
Spurious Relationship
Do taller people like action movies more than
shorter people do?
 What is the third variable?
Do days of high lemonade sales have more
drowning fatalities than days with low lemonade
sales?
 What is the third variable?
Intervening Variables: Interpretation
What variables can you suggest that “go in
between” residential mobility and school
achievement that might help us understand our
focal relationship better?
 These intervening variables do NOT explain
away the relationship — they clarify why/how it
comes about.
Intervening Variables: Interpretation
Examples
Why do women have lower incomes than men?
 Maybe they have not acquired the technical and
managerial skills that men have.
 Maybe they are less interested in promotions
into management than men are.
(These interpretations suggest that gender
discrimination in salary decisions is not the only
reason women have lower incomes than men.)
The Difference between
Interpretation (Intervening) and
Explanation (Spurious)
Gender  height  movie preferences
 Gender, the third variable, explains away the
spurious height  movie preference
relationship.
Gender  career choices  income
 Career choices, the intervening third variable,
contributes to interpreting the initial relationship
between gender and income.
Specification or Interaction Effects
Sometimes when we introduce a third variable, we
find that the initial bivariate (0-order) relationship is
different for different categories of the third
variable.
Specification: Examples [1]
In research on school achievement we (Prof.
Bootcheck and I) looked at the relationship
between living in a nuclear family and grades.
 For whites, this relationship was positive.
 For all other racial-ethnic categories, there was
no relationship.
Specification: Examples [2]
Can you think of a variable we could introduce into
our statistical analysis technique of the relationship
between residential mobility and school
achievement that might have different bivariate
relationships (one strong, one absent) for different
categories of the third variable?
Specification in a Crosstab
In a crosstab, this specification or interaction effect
would show up as a strong/significant relationship
in one of the tables for the layer variable (the third
variable), and it would be “Not Significant” in the
table for the other category of the layer variable.
In other words, the chi-square for one partial table
is significant, but it is not significant for the other
partial.
Suppressed Effects [1]
Introducing a third variable can reveal its
suppressed effects, which work in opposing
directions, cancelling each other out.
Fictitious example: Religious intensity and death
penalty views
 0-order: There appears to be no relationship.
Suppressed Effects [2]
 When we introduce region (north or south), we
see that the effects are opposite:
 For people living in the north of this fictitious country,
high religious intensity goes with opposition to the
death penalty.
 For people living in the south, high religious intensity
goes with support for the death penalty.
 The two inverse or opposed relationships cancel
each other out, unless we break the data down
by the regional variable.
Final Possibility: Replication
It is possible that the initial bivariate relationship
persists when we introduce the third variable.
 The partial tables for the categories of the third
(layer) variable look just the same as the initial
two-variable table.
Multivariate or
Multiple Linear Regression
 We specify two or more independent variables.
 Each may have a significant and maybe
moderate or even strong correlation with the
dependent variable.
 When they are placed in the regression model,
“only the strongest survive.”
 If they do not have a relationship with the DV
independent of their relationship with each other,
they will not be significant in the model.
Examples from the Country Data Set
Look at adjusted R2.
 Which variables have significant coefficients?
 What do the relative sizes of the betas tell you?
Hard to visualize.
Building models—all variables entered at the same
time or stepwise. See Nardi (2006, p. 97), which
is cited in Garner (2010, p. 333).
Logistic Regression [1]
Currently, logistic regression is a very popular
statistical analysis!
 It involves a dichotomous (or binary) outcome variable.
We can compute an overall odds ratio for the two
possible outcomes of this variable.
 It involves examining predictor variables (IVs) to see if
each one is related to a change in the odds ratio from its
overall level.
EXAMPLE:
Does growing up in a bilingual family raise or lower
an individual’s probability of completing high school,
compared to the overall odds of doing so?
Logistic Regression [2]
Independent variables need to be interval-ratio or
dummied variables (categoric variable broken
down into binary variables).
Alert: Which categories are defined as 0 and
1 for all the binary variables?
Negative coefficients mean lower odds. The odds
ratio falls below 1.
Logistic Regression: Example 1
Are income, race-ethnicity, gender, region, and
religion related to a vote for the Republican
presidential candidate?
 What characteristics raise the odds and which
lower the odds of a Republican vote?
 Which categories are labelled 1? Which 0?
(This will make a difference in how to read the
table of coefficients.)
Logistic Regression: Example 2
What individual characteristics are related to
experiencing foreclosure on one’s home?
Binary outcome = foreclosed or not foreclosed
so logistic regression
Contrast this to a question that could be answered
with linear regression.
 What neighbourhood characteristics are related
to a high foreclosure rate?
Download