Chapter 8 - Nora P. Reilly, Ph.D.

advertisement
CHAPTER EIGHT:
The Correlational (Passive) Research Strategy
I. The nature of correlational research
A. Assumptions of linearity and additivity
1. Linearity. As a general rule, the use of correlational research methods assumes that the
relationship between the independent and dependent variables is linear.
2. Additivity. Correlational analyses involving more than one independent variable also generally
assume that the relationship between the independent and dependent variables is additive; that
is, that there are no interactions.
B. Factors affecting the correlation coefficient
1. Reliability of the measures. As measures become less reliable, the observed correlation between
their scores underestimates the true correlation between the variables being measured.
2. Restriction in range occurs when the scores of one or both variables in a sample have a range of
values that is less than the range of scores in the population. Restriction in range reduces the
correlation found in a sample relative to the correlation that exists in the population.
3. Outliers are extreme scores, usually defined as being scores more than three standard
deviations above or below the mean. Outliers can either inflate or deflate correlations
depending on whether they are extremely high or extremely low scores.
4. Subgroup differences. The participant sample on which a correlation is based might contain
two or more subgroups, such as women and men. Unless the correlations between the variables
being studied are the same for all groups and all groups have the same mean scores on the
variables, the correlation in the combined group will not be an accurate reflection of the
subgroup correlations.
C. Multifaceted constructs. As noted in Chapter 1, multifaceted constructs “are composed of two or
more subordinate concepts, each of which can be distinguished from the others and measured
separately, despite their being related to each other both logically and empirically” (Carver, 1989,
p. 577). A major issue in research using multifaceted constructs is that of when facets should and
should not be combined.
1. Keeping facets separate. Facets should not be combined
a. When the facets are theoretically or empirically related to different dependent variables or to
different facets of a dependent variable
b. When the theory of the construct predicts an interaction among the facets
c. Simply as a matter of convenience
2. Combining facets. Facets could be combined
a. When one is interested in the latent variable represented by the combination of facets rather
than in the particular aspects of the variables represented by the facets
b. When, from a theoretical perspective, the latent variable is more important, more interesting,
or represents a more appropriate level of abstraction than do the facets
c. If you are trying to predict an index based on many related behaviors rather than on a single
behavior
d. If the facets are highly correlated, but in such cases the facets probably represent the same
construct rather than different facets of a construct
C. Some recommendations. These limitations of the correlational method mean that one should
1. Use only the most reliable measures of the variables.
53
54
CHAPTER 8
2. Whenever possible, check the ranges of the scores on the variables in your sample against
published norms or other data to determine if the ranges are restricted in the sample.
3. Plot the scores for the subgroups and the combined group and examine the plots for similarity,
deviations from linearity, and outliers.
4. Compute subgroup correlations and means, and check to ensure that they do not have an
adverse effect on the combined correlation.
5. When dealing with multifaceted constructs, avoid combining facets unless there is a compelling
reason to do so.
II. Simple and partial correlation analysis
A. Simple correlations
1. The correlation coefficient. Simple correlations are used to examine relationships between
variables and to develop bivariate regression equations.
2. Differences in correlations. Sometimes it is useful to know if the relationship between two
variables differs between groups. One must be careful when testing differences in correlation
coefficients, however, because a lack of difference in correlation coefficients does not
necessarily mean that the relationship between the variables is the same: the correlation
coefficient can be the same in two groups even though the regression slope is different.
B. Partial correlation analysis allows one to determine the strength of the relationship between two
variables with the effect of a third variable removed.
III. Multiple regression analysis (MRA) is an extension of simple and partial correlation to situations
in which there are more than two independent variables.
A. Forms of MRA
1. Simultaneous MRA derives the equation that most accurately predicts a criterion variable from
a set of predictor variables, using all of the predictors in the set.
2. Hierarchical MRA. In hierarchical MRA, predictors are partialed one at a time with the
researcher choosing the order of partialing to answer a particular question. It is this control over
the order of partialing that makes hierarchical MRA appropriate for testing hypotheses about
relationships between predictor variables and a criterion variable with other variables
controlled. The variables to be controlled must be entered first regardless of their correlations
with the criterion, which can only be done with hierarchical MRA.
3. Many experts now consider stepwise MRA to be flawed and do not recommend its use.
B. Information available from MRA
1. The multiple correlation coefficient (R) is an index of the degree of association between the
predictor variables as a set and the criterion variable, just as r is an index of the degree of
association between a single predictor variable and a criterion variable. R provides no
information about the relationship of any one predictor variable to the criterion variable.
2. The regression coefficient represents the amount of change in Y brought about by a unit change
in X. Regression coefficients can be either standardized () or unstandardized (B).
a. Because the s for all the independent variables in an analysis are on the same scale, these
coefficients can be used to compare the degree to which the independent variables used in
the same regression analysis predict the dependent variable.
b. Because the Bs have the same units regardless of the sample, these coefficients can be used to
compare the predictive utility of independent variables across samples.
THE CORRELATIONAL (PASSIVE) RESEARCH STRATEGY
55
3. Change in R2 represents the increase in the proportion of variance in the dependent variable
that is accounted for by adding another independent variable to the regression equation.
However, the change in R2 associated with an independent variable can fluctuate as a function
of the order in which the variable is entered into the equation. A variable entered earlier will
generally result in a larger change in R2 than if it is entered later, especially if it has a high
correlation with the other predictor variables.
C. The problem of multicollinearity. Multicollinearity is a condition that arises when two or more
predictor variables are highly correlated with each other.
1. Effects of multicollinearity. Multicollinearity can lead to inflated error terms for regression
coefficients and to misleading conclusions about changes in R2.
2. Causes of multicollinearity. Multicollinearity can arise from several causes, including
a. Inclusion of multiple measures of one construct in the set of predictor variables
b. Highly correlated predictor variables
c. Sampling error
3. Detecting multicollinearity. Although there is no statistical test for multicollinearity, two rules
of thumb apply:
a. The simplest test is inspection of the correlation matrix; correlations equal to or greater than
.80 are generally taken as being indicative of multicollinearity.
56
CHAPTER 8
b. Multicollinearity is also a function of the pattern of correlations among several predictors,
none of which might exceed .80, so multicollinearity might not be detectable through
inspection. Therefore, another method of testing for multicollinearity is to compute a series
of multiple regression equations predicting each independent variable from the remaining
independent variables. If R for an equation exceeds .9, the predicted variable is
multicollinear with at least one of the other variables. The multiple regression modules of
many statistical software packages give the option of computing the variance inflation factor
(VIF). A VIF greater than 10 indicates the presence of multicollinearity.
4. Dealing with multicollinearity
a. Avoid including redundant variables, such as multiple measures of a construct and natural
confounds, in the set of predictor variables.
b. If sampling error could be the source of multicollinearity, collecting more data to reduce the
sampling error might reduce the problem.
c. Another solution is to delete independent variables that might be the cause of the problem.
d. Finally, you might conduct a factor analysis of the predictor variables and combine
empirically related variables into one variable.
D. MRA as an alternative to ANOVA. Because there are mathematical ways around the assumptions
of linearity and additivity, MRA can sometimes be a useful alternative to ANOVA.
1. Continuous independent variables. In ANOVA, when an independent variable is measured as
a continuous variable, it must be transformed into a categorical variable so that research
participants can be placed into the discrete groups required by ANOVA. This transformation is
often accomplished using a median split. The use of median splits can lead to several problems:
a. Because different samples can have different medians, the reliability of median split
classifications can be low.
b. Median splits result in lower statistical power.
c. Median splits with two or more correlated independent variables in a factorial design can
lead to false statistical significance.
d. These problems all have the same solution: Treat the independent variable as continuous
rather than as a set of categories, and analyze the data using MRA.
2. Correlated independent variables. Because ANOVA assumes that the independent variables
are uncorrelated and MRA does not, MRA is preferred to ANOVA when independent variables
are correlated.
IV. Some other correlational techniques
A. Logistic regression analysis is an analog to MRA used when the dependent variable is categorical
rather than continuous.
B. Multiway frequency analysis allows a researcher to examine the pattern of relationships among a
set of nominal level variables.
1. The most familiar example of multiway frequency analysis is the chi-square test for association,
which examines the degree of relationship between two nominal level variables.
2. Loglinear analysis extends the principles of chi-square analysis to situations in which there are
more than two variables. When one of the variables in a loglinear analysis is considered to be
the dependent variable and the others are considered to be independent variables, the
procedure is sometimes called logit analysis.
C. Data types and data analysis. Each combination of categorical or continuous independent variable
and categorical or continuous dependent variable has an appropriate statistical procedures for
THE CORRELATIONAL (PASSIVE) RESEARCH STRATEGY
57
data analysis. Be sure to use the right form of statistical analysis for the combination of data types
that you have in your research.
V. Testing mediational hypotheses. Mediational models postulate that an independent variable (I) affects
a mediating variable (M), which in turn affects the dependent variable (D); that is, I —> M —> D.
A. Simple mediation: Three variables. A mediational situation potentially exists when I is correlated
with both D and M, and M is correlated with D. The existence of mediation can be tested by taking
the partial correlation of I with D controlling for M: if the partial correlation is substantially smaller
than the zero-order correlation between I and D, then M mediates the relationship between I and
D.
B. Complex models
1. Path analysis. Models with more than one mediating variable can be tested by path analysis,
which uses sets of multiple regression analyses to estimate the strength of the relationship
between an independent variable and a dependent variable controlling for the hypothesized
mediating variables.
2. Latent variables analysis (also called covariance structure analysis and LISREL analysis) uses
the multiple measures of each construct to estimate a latent variable score representing the
construct. The technique then estimates the path coefficients for the relationships among the
latent variables.
3. Prospective research. The use of prospective correlations—examining the correlation of a
hypothesized cause at Time 1 with its hypothesized effect at Time 2—is one way of
investigating the time precedence of a possible causal variable.
C. Interpretational limitations
1. Completeness of the model. One must be sure that a test of a mediational model includes all
relevant variables and that the assumptions of linearity and additivity are met.
2. Alternative models. One must also consider the possibility that there are alternative models to
the one tested that fit the data equally well and therefore offer alternative interpretations of the
data.
VI. Factor analysis is a statistical technique that can be applied to a set of variables to identify subsets
of variables that are correlated with each other but that are relatively uncorrelated with the
variables in the other subsets.
A. Uses of factor analysis. At the most general level, factor analysis is used to summarize the pattern
of correlations among a set of variables. In practice, factor analysis can serve several purposes, two
of which tend to predominate.
1. Data reduction uses factor analysis to condense a large number of variables into a few in order
to simplify data analysis.
2. Scale development. Factor analysis can determine the number of constructs measured by a
scale. There will be one factor for each construct.
B. Considerations in factor analysis. There are at least seven approaches to determining the number
of factors underlying a set of correlations and nine ways of simplifying those factors so that they
can be easily interpreted. This discussion focuses on the more common questions that arise in
factor analysis and on the most common answers to these questions.
1. Number of research participants. Most authorities recommend 5 to 10 respondents per item
included in the analysis, with a minimum of 100 to 200 participants, although little
improvement in factor stability may be found when sample sizes exceed 300, as long as there
are more respondents than items.
58
CHAPTER 8
2. Quality of the data
a. All the factors that we noted earlier as threats to the validity of correlational research—
outliers, restriction in range, and so forth—also threaten the validity of a factor analysis.
Multicollinearity is also a problem, because a large number of extremely high correlations
causes problems in the mathematics underlying the technique.
b. The correlation matrix of the scores on the items to be factor analyzed should include at least
several large correlations, indicating that sets of items are interrelated; one should not
conduct a factor analysis if there are no correlations larger than .30. One can also examine
the determinant of the correlation matrix: the closer the determinant is to zero, the higher
the correlations among the variables, and the more likely one is to find stable factors.
3. Methods of factor extraction and rotation
a. “Extraction” refers to the method used to determine the number of factors underlying a set
of correlations. Factors are extracted in order of importance, which is defined as the
percentage of variance in the variables being analyzed that a factor can account for. A factor
analysis will initially extract as many factors as there are variables, each accounting for a
decreasing percentage of variance. There are seven extraction methods, all of which give
very similar results with high-quality data and a reasonable sample size.
b. “Rotation” refers to the method used to clarify the factors once they are extracted. There are
two general categories of rotation. Orthogonal rotation forces factors to be uncorrelated with
one another; oblique rotation allows factors to be correlated.
4. Determining the number of factors. It is not always easy to decide how many factors underlie a
set of correlations. The decision is a matter of judgment rather than statistics, although there are
two common rules of thumb for guidance. These rules are based on the factors’ eigenvalues,
which represent the percentage of variance in the variables being analyzed that can be
accounted for by a factor.
a. Generally, factors with eigenvalues of less than 1 are considered to be unimportant.
b. When there are many factors with eigenvalues greater than 1, the scree test is often used to
reduce the number of factors. The scree test is conducted by plotting the eigenvalue of each
factor against its order of extraction; generally, the scree plot will decline sharply, then level
off. The point at which the scree levels off indicates the optimal number of factors in the
data.
5. Interpreting the factors
a. The result of a factor analysis is a matrix of factor loadings; the loadings represent the
correlation of each item with the underlying factor. Most authorities hold that an item
should have a loading of at least .30 to be considered part of a factor.
b. One must decide what construct underlies the variables on each factor and name the factors.
Factor interpretation and naming are completely judgmental processes: One examines the
items that load on a factor and tries to determine the concept that is common to them, giving
more weight to items that have factor loadings with higher absolute values
c. Factor loadings can be either positive or negative, and a variable can load on more than one
factor.
6. Factor scores, respondents’ combined scores for each factor, can be computed in two ways:
a. Have the factor analysis computer program generate factor score coefficients for the items,
multiply each participant’s Z-score on each item by the item’s factor score coefficient for a
factor, and sum the resulting products.
b. When all the items use the same rating scale (e.g., a 1-to-7 scale), one can reverse score items
with negative loadings and sum participants’ scores on the items that fall at or above the
THE CORRELATIONAL (PASSIVE) RESEARCH STRATEGY
59
cutoff point for loading on each factor, just as one sums the scores on the items of a multiitem scale.
QUESTIONS AND EXERCISES FOR REVIEW
1. Describe the circumstances in which the correlational research strategy would be preferable to the
experimental strategy. What can correlational research tell us about causality?
2. Describe the effects of each of these factors on the size of a correlation coefficient:
a.
Low reliability of the measures
b.
Restriction in range
c.
Outliers
d. Subgroup differences in the correlation between the variables
3. For the factors listed in Question 2, describe how you can determine if a problem exists, and describe
what can be done to rectify the problem.
4. Find three journal articles on a topic that interests you that used correlational research. Did the
researchers report whether they checked for the potential problems listed in Question 2? If they found
any, did they take the appropriate steps? If they did not check, how would the presence of each problem
affect the interpretation of their results?
5. If you find a significant difference between groups, such as men and women, in the size of the
correlation between two variables, how should you interpret the finding?
6. Explain why it is generally undesirable to combine the facets of a multifaceted construct into an overall
index. Describe the circumstances under which it might be useful to combine facets.
7. Describe the purpose of partial correlation analysis.
8.
Describe the forms of multiple regression analysis (MRA) and the purpose for which each is best suited.
9.
Describe the type of information that each of the following provide about the relationship between the
predictor variables and the criterion variable in MRA:
a.
The multiple correlation coefficient (R)
b.
The standardized regression coefficient ()
c.
The unstandardized regression coefficient (B)
d. Change in R2
10. Describe the effects of multicollinearity. How can you detect and deal with this problem?
11. Describe the circumstances under which MRA is preferable to ANOVA.
12. Why is it undesirable to use a median split to transform a continuous variable into a categorical variable?
13. How is logistic regression analysis similar to MRA, and how is it different?
14. When should one use multiway frequency analysis?
15. How does the nature of the independent and dependent variables affect the form of data analysis you
should use?
16. Describe how mediational hypotheses are tested. Explain the limits on the interpretation of research that
tests mediational hypotheses.
17. What is factor analysis? Describe the major issues to consider in conducting and understanding the
results of a factor analysis….but not in lots of detail.
Download