CHAPTER EIGHT: The Correlational (Passive) Research Strategy I. The nature of correlational research A. Assumptions of linearity and additivity 1. Linearity. As a general rule, the use of correlational research methods assumes that the relationship between the independent and dependent variables is linear. 2. Additivity. Correlational analyses involving more than one independent variable also generally assume that the relationship between the independent and dependent variables is additive; that is, that there are no interactions. B. Factors affecting the correlation coefficient 1. Reliability of the measures. As measures become less reliable, the observed correlation between their scores underestimates the true correlation between the variables being measured. 2. Restriction in range occurs when the scores of one or both variables in a sample have a range of values that is less than the range of scores in the population. Restriction in range reduces the correlation found in a sample relative to the correlation that exists in the population. 3. Outliers are extreme scores, usually defined as being scores more than three standard deviations above or below the mean. Outliers can either inflate or deflate correlations depending on whether they are extremely high or extremely low scores. 4. Subgroup differences. The participant sample on which a correlation is based might contain two or more subgroups, such as women and men. Unless the correlations between the variables being studied are the same for all groups and all groups have the same mean scores on the variables, the correlation in the combined group will not be an accurate reflection of the subgroup correlations. C. Multifaceted constructs. As noted in Chapter 1, multifaceted constructs “are composed of two or more subordinate concepts, each of which can be distinguished from the others and measured separately, despite their being related to each other both logically and empirically” (Carver, 1989, p. 577). A major issue in research using multifaceted constructs is that of when facets should and should not be combined. 1. Keeping facets separate. Facets should not be combined a. When the facets are theoretically or empirically related to different dependent variables or to different facets of a dependent variable b. When the theory of the construct predicts an interaction among the facets c. Simply as a matter of convenience 2. Combining facets. Facets could be combined a. When one is interested in the latent variable represented by the combination of facets rather than in the particular aspects of the variables represented by the facets b. When, from a theoretical perspective, the latent variable is more important, more interesting, or represents a more appropriate level of abstraction than do the facets c. If you are trying to predict an index based on many related behaviors rather than on a single behavior d. If the facets are highly correlated, but in such cases the facets probably represent the same construct rather than different facets of a construct C. Some recommendations. These limitations of the correlational method mean that one should 1. Use only the most reliable measures of the variables. 53 54 CHAPTER 8 2. Whenever possible, check the ranges of the scores on the variables in your sample against published norms or other data to determine if the ranges are restricted in the sample. 3. Plot the scores for the subgroups and the combined group and examine the plots for similarity, deviations from linearity, and outliers. 4. Compute subgroup correlations and means, and check to ensure that they do not have an adverse effect on the combined correlation. 5. When dealing with multifaceted constructs, avoid combining facets unless there is a compelling reason to do so. II. Simple and partial correlation analysis A. Simple correlations 1. The correlation coefficient. Simple correlations are used to examine relationships between variables and to develop bivariate regression equations. 2. Differences in correlations. Sometimes it is useful to know if the relationship between two variables differs between groups. One must be careful when testing differences in correlation coefficients, however, because a lack of difference in correlation coefficients does not necessarily mean that the relationship between the variables is the same: the correlation coefficient can be the same in two groups even though the regression slope is different. B. Partial correlation analysis allows one to determine the strength of the relationship between two variables with the effect of a third variable removed. III. Multiple regression analysis (MRA) is an extension of simple and partial correlation to situations in which there are more than two independent variables. A. Forms of MRA 1. Simultaneous MRA derives the equation that most accurately predicts a criterion variable from a set of predictor variables, using all of the predictors in the set. 2. Hierarchical MRA. In hierarchical MRA, predictors are partialed one at a time with the researcher choosing the order of partialing to answer a particular question. It is this control over the order of partialing that makes hierarchical MRA appropriate for testing hypotheses about relationships between predictor variables and a criterion variable with other variables controlled. The variables to be controlled must be entered first regardless of their correlations with the criterion, which can only be done with hierarchical MRA. 3. Many experts now consider stepwise MRA to be flawed and do not recommend its use. B. Information available from MRA 1. The multiple correlation coefficient (R) is an index of the degree of association between the predictor variables as a set and the criterion variable, just as r is an index of the degree of association between a single predictor variable and a criterion variable. R provides no information about the relationship of any one predictor variable to the criterion variable. 2. The regression coefficient represents the amount of change in Y brought about by a unit change in X. Regression coefficients can be either standardized () or unstandardized (B). a. Because the s for all the independent variables in an analysis are on the same scale, these coefficients can be used to compare the degree to which the independent variables used in the same regression analysis predict the dependent variable. b. Because the Bs have the same units regardless of the sample, these coefficients can be used to compare the predictive utility of independent variables across samples. THE CORRELATIONAL (PASSIVE) RESEARCH STRATEGY 55 3. Change in R2 represents the increase in the proportion of variance in the dependent variable that is accounted for by adding another independent variable to the regression equation. However, the change in R2 associated with an independent variable can fluctuate as a function of the order in which the variable is entered into the equation. A variable entered earlier will generally result in a larger change in R2 than if it is entered later, especially if it has a high correlation with the other predictor variables. C. The problem of multicollinearity. Multicollinearity is a condition that arises when two or more predictor variables are highly correlated with each other. 1. Effects of multicollinearity. Multicollinearity can lead to inflated error terms for regression coefficients and to misleading conclusions about changes in R2. 2. Causes of multicollinearity. Multicollinearity can arise from several causes, including a. Inclusion of multiple measures of one construct in the set of predictor variables b. Highly correlated predictor variables c. Sampling error 3. Detecting multicollinearity. Although there is no statistical test for multicollinearity, two rules of thumb apply: a. The simplest test is inspection of the correlation matrix; correlations equal to or greater than .80 are generally taken as being indicative of multicollinearity. 56 CHAPTER 8 b. Multicollinearity is also a function of the pattern of correlations among several predictors, none of which might exceed .80, so multicollinearity might not be detectable through inspection. Therefore, another method of testing for multicollinearity is to compute a series of multiple regression equations predicting each independent variable from the remaining independent variables. If R for an equation exceeds .9, the predicted variable is multicollinear with at least one of the other variables. The multiple regression modules of many statistical software packages give the option of computing the variance inflation factor (VIF). A VIF greater than 10 indicates the presence of multicollinearity. 4. Dealing with multicollinearity a. Avoid including redundant variables, such as multiple measures of a construct and natural confounds, in the set of predictor variables. b. If sampling error could be the source of multicollinearity, collecting more data to reduce the sampling error might reduce the problem. c. Another solution is to delete independent variables that might be the cause of the problem. d. Finally, you might conduct a factor analysis of the predictor variables and combine empirically related variables into one variable. D. MRA as an alternative to ANOVA. Because there are mathematical ways around the assumptions of linearity and additivity, MRA can sometimes be a useful alternative to ANOVA. 1. Continuous independent variables. In ANOVA, when an independent variable is measured as a continuous variable, it must be transformed into a categorical variable so that research participants can be placed into the discrete groups required by ANOVA. This transformation is often accomplished using a median split. The use of median splits can lead to several problems: a. Because different samples can have different medians, the reliability of median split classifications can be low. b. Median splits result in lower statistical power. c. Median splits with two or more correlated independent variables in a factorial design can lead to false statistical significance. d. These problems all have the same solution: Treat the independent variable as continuous rather than as a set of categories, and analyze the data using MRA. 2. Correlated independent variables. Because ANOVA assumes that the independent variables are uncorrelated and MRA does not, MRA is preferred to ANOVA when independent variables are correlated. IV. Some other correlational techniques A. Logistic regression analysis is an analog to MRA used when the dependent variable is categorical rather than continuous. B. Multiway frequency analysis allows a researcher to examine the pattern of relationships among a set of nominal level variables. 1. The most familiar example of multiway frequency analysis is the chi-square test for association, which examines the degree of relationship between two nominal level variables. 2. Loglinear analysis extends the principles of chi-square analysis to situations in which there are more than two variables. When one of the variables in a loglinear analysis is considered to be the dependent variable and the others are considered to be independent variables, the procedure is sometimes called logit analysis. C. Data types and data analysis. Each combination of categorical or continuous independent variable and categorical or continuous dependent variable has an appropriate statistical procedures for THE CORRELATIONAL (PASSIVE) RESEARCH STRATEGY 57 data analysis. Be sure to use the right form of statistical analysis for the combination of data types that you have in your research. V. Testing mediational hypotheses. Mediational models postulate that an independent variable (I) affects a mediating variable (M), which in turn affects the dependent variable (D); that is, I —> M —> D. A. Simple mediation: Three variables. A mediational situation potentially exists when I is correlated with both D and M, and M is correlated with D. The existence of mediation can be tested by taking the partial correlation of I with D controlling for M: if the partial correlation is substantially smaller than the zero-order correlation between I and D, then M mediates the relationship between I and D. B. Complex models 1. Path analysis. Models with more than one mediating variable can be tested by path analysis, which uses sets of multiple regression analyses to estimate the strength of the relationship between an independent variable and a dependent variable controlling for the hypothesized mediating variables. 2. Latent variables analysis (also called covariance structure analysis and LISREL analysis) uses the multiple measures of each construct to estimate a latent variable score representing the construct. The technique then estimates the path coefficients for the relationships among the latent variables. 3. Prospective research. The use of prospective correlations—examining the correlation of a hypothesized cause at Time 1 with its hypothesized effect at Time 2—is one way of investigating the time precedence of a possible causal variable. C. Interpretational limitations 1. Completeness of the model. One must be sure that a test of a mediational model includes all relevant variables and that the assumptions of linearity and additivity are met. 2. Alternative models. One must also consider the possibility that there are alternative models to the one tested that fit the data equally well and therefore offer alternative interpretations of the data. VI. Factor analysis is a statistical technique that can be applied to a set of variables to identify subsets of variables that are correlated with each other but that are relatively uncorrelated with the variables in the other subsets. A. Uses of factor analysis. At the most general level, factor analysis is used to summarize the pattern of correlations among a set of variables. In practice, factor analysis can serve several purposes, two of which tend to predominate. 1. Data reduction uses factor analysis to condense a large number of variables into a few in order to simplify data analysis. 2. Scale development. Factor analysis can determine the number of constructs measured by a scale. There will be one factor for each construct. B. Considerations in factor analysis. There are at least seven approaches to determining the number of factors underlying a set of correlations and nine ways of simplifying those factors so that they can be easily interpreted. This discussion focuses on the more common questions that arise in factor analysis and on the most common answers to these questions. 1. Number of research participants. Most authorities recommend 5 to 10 respondents per item included in the analysis, with a minimum of 100 to 200 participants, although little improvement in factor stability may be found when sample sizes exceed 300, as long as there are more respondents than items. 58 CHAPTER 8 2. Quality of the data a. All the factors that we noted earlier as threats to the validity of correlational research— outliers, restriction in range, and so forth—also threaten the validity of a factor analysis. Multicollinearity is also a problem, because a large number of extremely high correlations causes problems in the mathematics underlying the technique. b. The correlation matrix of the scores on the items to be factor analyzed should include at least several large correlations, indicating that sets of items are interrelated; one should not conduct a factor analysis if there are no correlations larger than .30. One can also examine the determinant of the correlation matrix: the closer the determinant is to zero, the higher the correlations among the variables, and the more likely one is to find stable factors. 3. Methods of factor extraction and rotation a. “Extraction” refers to the method used to determine the number of factors underlying a set of correlations. Factors are extracted in order of importance, which is defined as the percentage of variance in the variables being analyzed that a factor can account for. A factor analysis will initially extract as many factors as there are variables, each accounting for a decreasing percentage of variance. There are seven extraction methods, all of which give very similar results with high-quality data and a reasonable sample size. b. “Rotation” refers to the method used to clarify the factors once they are extracted. There are two general categories of rotation. Orthogonal rotation forces factors to be uncorrelated with one another; oblique rotation allows factors to be correlated. 4. Determining the number of factors. It is not always easy to decide how many factors underlie a set of correlations. The decision is a matter of judgment rather than statistics, although there are two common rules of thumb for guidance. These rules are based on the factors’ eigenvalues, which represent the percentage of variance in the variables being analyzed that can be accounted for by a factor. a. Generally, factors with eigenvalues of less than 1 are considered to be unimportant. b. When there are many factors with eigenvalues greater than 1, the scree test is often used to reduce the number of factors. The scree test is conducted by plotting the eigenvalue of each factor against its order of extraction; generally, the scree plot will decline sharply, then level off. The point at which the scree levels off indicates the optimal number of factors in the data. 5. Interpreting the factors a. The result of a factor analysis is a matrix of factor loadings; the loadings represent the correlation of each item with the underlying factor. Most authorities hold that an item should have a loading of at least .30 to be considered part of a factor. b. One must decide what construct underlies the variables on each factor and name the factors. Factor interpretation and naming are completely judgmental processes: One examines the items that load on a factor and tries to determine the concept that is common to them, giving more weight to items that have factor loadings with higher absolute values c. Factor loadings can be either positive or negative, and a variable can load on more than one factor. 6. Factor scores, respondents’ combined scores for each factor, can be computed in two ways: a. Have the factor analysis computer program generate factor score coefficients for the items, multiply each participant’s Z-score on each item by the item’s factor score coefficient for a factor, and sum the resulting products. b. When all the items use the same rating scale (e.g., a 1-to-7 scale), one can reverse score items with negative loadings and sum participants’ scores on the items that fall at or above the THE CORRELATIONAL (PASSIVE) RESEARCH STRATEGY 59 cutoff point for loading on each factor, just as one sums the scores on the items of a multiitem scale. QUESTIONS AND EXERCISES FOR REVIEW 1. Describe the circumstances in which the correlational research strategy would be preferable to the experimental strategy. What can correlational research tell us about causality? 2. Describe the effects of each of these factors on the size of a correlation coefficient: a. Low reliability of the measures b. Restriction in range c. Outliers d. Subgroup differences in the correlation between the variables 3. For the factors listed in Question 2, describe how you can determine if a problem exists, and describe what can be done to rectify the problem. 4. Find three journal articles on a topic that interests you that used correlational research. Did the researchers report whether they checked for the potential problems listed in Question 2? If they found any, did they take the appropriate steps? If they did not check, how would the presence of each problem affect the interpretation of their results? 5. If you find a significant difference between groups, such as men and women, in the size of the correlation between two variables, how should you interpret the finding? 6. Explain why it is generally undesirable to combine the facets of a multifaceted construct into an overall index. Describe the circumstances under which it might be useful to combine facets. 7. Describe the purpose of partial correlation analysis. 8. Describe the forms of multiple regression analysis (MRA) and the purpose for which each is best suited. 9. Describe the type of information that each of the following provide about the relationship between the predictor variables and the criterion variable in MRA: a. The multiple correlation coefficient (R) b. The standardized regression coefficient () c. The unstandardized regression coefficient (B) d. Change in R2 10. Describe the effects of multicollinearity. How can you detect and deal with this problem? 11. Describe the circumstances under which MRA is preferable to ANOVA. 12. Why is it undesirable to use a median split to transform a continuous variable into a categorical variable? 13. How is logistic regression analysis similar to MRA, and how is it different? 14. When should one use multiway frequency analysis? 15. How does the nature of the independent and dependent variables affect the form of data analysis you should use? 16. Describe how mediational hypotheses are tested. Explain the limits on the interpretation of research that tests mediational hypotheses. 17. What is factor analysis? Describe the major issues to consider in conducting and understanding the results of a factor analysis….but not in lots of detail.