DTC Quantitative Research Methods Three (or more) Variables: Extensions to Cross- tabular Analyses

advertisement
DTC Quantitative Research Methods
Three (or more) Variables:
Extensions to
Cross- tabular Analyses
Thursday 13th November 2014
Multivariate analysis
• So far we have tended to concentrate on two-way relationships
(e.g. between gender and participation in sports). But we have
started to look at about three-way relationships (e.g. the
gendering of the relationship between age and participation in
sports).
• Social relationships and phenomena are usually more complex
than is allowed for in a bivariate analysis.
• Multivariate analyses are thus commonly used as a reflection of
this complexity.
• Hence, this week we will look briefly about the rationale for
multivariate analysis and have a think about cross-tabular
techniques for conducting this form of analysis.
Multivariate analysis
De Vaus (1996: 198) suggests that we can use multivariate
analysis to elaborate bivariate relationships, in order to
answer the following questions:
1. Why does the relationship [between two variables] exist? What are
the mechanisms and processes by which one variable is linked to
another?
2. What is the nature of the relationship? Is it causal or non-causal?
3. How general is the relationship? Does it hold for people in general, or
is it specific to certain subgroups?
This is because multivariate analysis enables the identification of:
Spurious relationships
Intervening variables
The replication of relationships
The specification of relationships
Spurious relationships
• A spurious relationship exists where two
variables are not related but a relationship
between them is generated by their
relationships with a third variable.
• For example:
Age
Height
Spurious relationship
Reading ability
Intervening variables
• Sometimes, although there is a real (non-spurious) relationship
between two variables, we want to establish why that relationship
exists.
• For example, if we discover that there is a relationship between
risk of unemployment and ethnicity, we want to know why that is
the case. One possibility is that some ethnic groups have lower
educational levels and that this has implications for their ability to
get work. In this case education would be an intervening variable.
Ethnicity
Education
Unemployment
• Intervening variables enable us to answer questions about the
bivariate relationship between two variables – suggesting that (in
this case) the relationship between ethnicity and unemployment
is not direct but (at least in part) occurs via educational levels.
Is it spurious or intervening?
When we do statistical tests we will obtain similar results for
a spurious variable and an intervening variable:
In both cases the effect of the independent variable on the
dependent variable will be moderated by the third variable.
So how do we know whether this third variable provides
evidence of a spurious relationship or is an intervening
variable?
– There is no hard-and-fast statistical rule for deciding this.
– But if we are suggesting that a variable is intervening, the logic of
the process must make sense – i.e. you must have a cogent
theoretical reason for thinking that your independent variable
affects the intervening variable which in turn affects the dependent
variable.
– This kind of causal process is easiest to argue for when the timing
of events supports it, i.e. when the intervening variable can be
seen to occur in between the independent and dependent
variables (e.g. education in the earlier example of the relationship
between ethnicity and unemployment).
Replication
• Sometimes when we have found a basic
(‘zero-order’) relationship between two
variables (e.g. ethnicity and unemployment),
we want to demonstrate that this relationship
exists within different subgroups of the
population (e.g. for both men and women; for
those of different ages…).
• Where the relationship is replicated we can
rule out the possibility that it is produced by
the variable in question, either as an
intervening variable or in a spurious way.
Specification
• Sometimes a particular variable only has an
effect in specific situations. The variable that
determines these situations is said to interact
with the independent variable.
• For example, an example in De Vaus’s book
suggests that going to a religious school makes
boys more religious but has little or no effect on
girls.
• In this case type of school interacts with gender:
religious education only affects students’
religiosity in combination with being male.
Specification (interactions)
Graphical representation of the relationship between religious education
and religiousness, controlling for sex:
Interaction between
sex and religiousness
of school
No interaction
Religiousness
Religiousness
high
high
boys
boys
girls
girls
low
low
Not at all
Very
How religious was your
education?
Not at all
Very
How religious was your
education?
Using Cramér’s V to classify a
multivariate situation
If we use SPSS to produce a cross-tabulation of two variables,
then we can elaborate this relationship by introducing a third
variable as a layer variable. Examining the Cramér’s V values
for the original cross-tabulation and for the layers of the
elaborated cross-tabulation tells us what kind of situation we
are looking at:
• If the Cramér’s V values for the layers are all similar, then
we have a situation of replication.
• If the Cramér’s V values are smaller for the layered crosstabulation than the value for the original cross-tabulation,
then we either have a situation where the third variable is
acting as an intervening variable, or one where it is
inducing a spurious relationship between the original two
variables. Deciding between these two options involves
reflecting on whether the third variable makes sense
conceptually as part of some causal mechanism linking the
original two variables.
Using Cramér’s V to classify a
multivariate situation (continued)
• If the Cramér’s V values for the layered cross-tabulation
vary in size, perhaps with some being smaller than the
original value and some being as large or larger than it,
then the situation is one of specification.
• However, if one or more of the Cramér’s V values is larger
than the original value, then a failure to take account of
the third variable in the first instance may also have been
suppressing an underlying relationship between the two
variables.
• This latter situation is a variation on the theme of
spuriousness: in this case, the absence of a bivariate
relationship is spurious rather than the presence of one!)
More generally…
• Multivariate analyses can utilise a variety of techniques
(depending on the form of the data, research questions to be
addressed, etc. – we will be looking at multiple (linear)
regression, but other ‘popular’ techniques include logistic
regression and log-linear models), in order to determine whether
the relationship between two variables persists or is altered
when we ‘control for’ a third (or fourth, or fifth...) variable.
• Multivariate analysis can also enable us to establish which
variable(s) has/have the greatest impact on a dependent
variable – e.g. Is sex more important than ‘race’ in determining
income?
• It is often important for a multivariate analysis to check for
interactions between the effects of independent variables, as
discussed earlier under the heading of specification.
An example (from BSA 2006)
View on whether pre-marital sex wrong
Has religion?
No
Yes
Total
24 = 80.81 (p < 0.001)
Always
Mostly
Sometimes Rarely
Not at all
Total
7
13
41
52
361
474
1.5%
2.7%
8.6%
11.0%
76.2%
100.0%
51
59
87
64
296
557
9.2%
10.6%
15.6%
11.5%
53.1%
100.0%
58
72
128
116
657
1031
5.6%
7.0%
12.4%
11.3%
63.7%
100.0%
Cramér’s V = 0.280
But if we split the crosstabulation by age...
Under 45:
24 = 53.52 (p < 0.001)
Cramér’s V = 0.334
45 or over: 24 = 22.27 (p < 0.001)
Cramér’s V = 0.201
Hence there is an extent to which (a small) part of the bivariate
relationship was a spurious consequence of age (since 53.52 +
22.27 = 75.79, which is less than 80.81, and the Cramér’s V
values show elements both of replication (since there is a
statistically significant relationship for both age groups), and also
of specification (since the relationship appears weaker for the
younger age group, i.e. the effects of religion and age interact).
Testing for interactions
• Unfortunately, as mentioned earlier, testing
for an interaction in a three-way crosstabulation requires knowledge of an
additional technique (hierarchical log-linear
modelling).
Download