You are interested in the effect of physical activity on health. You do a cross-sectional study in which you measure health, physical activity, and other variables. In particular, you measure socioeconomic status (SES) and find that SES and physical activity are both positively correlated with health. Further, you find a high correlation between SES and activity. High correlations among the X variables are referred to as collinearity, or multicollinearity in the case of multiple X variables. People on high SES eat good food, live in toxin-free classy parts of town, read Time magazine, and think they're alpha in every way. All these things could account for their good health. How do you analyze your data to address this potential for the effect of activity on health to be "confounded" by SES? How do you express the magnitude of the resulting effect of activity on health? Pedhazur (1997) suggests calculating the partial or semi-partial correlations among the variables by asking specific questions about their relationships. When the X variables are correlated, the part of Y that they explain overlaps to some degree, and not as much of Y is explained as when the X variables are uncorrelated. Venn diagrams can be used to illustrate this concept: In this example, Y is the dependent variable, and X1 and X2 are the two independent variables. r12 is defined as the simple, zero-order correlation between variables X1 and X2 (i.e., collinearity). r2 represents the coefficient of determination and is defined as the proportion of variance explained. When r12 = 0, we have: Y Y X1 X2 X1 ry21 ry22 Ry212 X2 When r12 ≠ 0, the 2 independent variables are explaining the same part of Y to some degree. Partial and Semi-partial Correlations Because in multiple regression the correlations among the X variables influence the regression coefficients (b-values), it is often of interest to determine what the relationship between Y and some X variable would be if the other X variables were not in the equation. This is referred to as controlling for the effects of the other X variables. In experimental research, the independent variables are controlled by design. This results in uncorrelated independent variables. Other forms of control through research design are “matching” and “subject selection.” For example, using Venn diagrams, below are three possibilities for visualizing the relationships among socioeconomic status (SES), physical activity, and health (H) presented by Dr. Hopkins: SES (a) (b) (c) H H H PA Minimal collinearity between SES and PA SES PA Moderate collinearity between SES and PA SES PA High collinearity between SES and PA If we could control for the effects of SES, we would probably find a moderate to high correlation between PA and H. So, Venn diagram (b) or (c) are more likely in this scenario. One way to control for SES would be to use only participants from one SES classification. We could also control for the effects of SES by a statistical method known as partialing. A partial correlation is a measure of the correlation between two variables with the effects of a third variable removed or “partialed out” (i.e., controlled for). Partial correlations allow us to see what the correlations between two variables would be if the third variable were not there. It’s important to realize that in actuality, the third variable is still there – its effects are just removed statistically, not physically. Another way of talking about a partial correlation is to say that it is a correlation between X1 and Y when everyone has the same value on X2. X2 would be the variable that is being partialed out. The idea statistically is to correlate the parts of Y (health, H) and X1 (physical activity, PA) that are not related to X2 (socioeconomic status, SES). This would involve removing the parts of H and PA that overlap with SES, and correlating on the parts of these variables that do not overlap with SES. H SES PA Remove SES from the correlation between PA and H. Recall that when two variables X and Y are correlated, the residuals (Y – Y’) should not be correlated with either one. Therefore, the parts of PA and H that are not related to SES would be the residuals we would get by predicting H and PA from SES. Statistical control or partialing is therefore accomplished by: 1. Predicting H and PA from SES, where SES is the controlled variable. 2. Correlating the residuals from these two regressions: (H – H’) and (PA – PA’). These are the parts of H and PA that are not related to SES. Because it involves the use of residuals, this process is sometimes referred to as residualizing Y and X1 with respect to X2. The notation ry1.2 means the partial correlation of X1 (PA) and Y (H) with X2 (SES) partialed out. This is referred to as a first-order partial because only one variable is partialed out. Any number of variables could conceivably by partialed out, but first-order partials are probably most common. The formula for the first order partial ry1.2 is: ry1.2 ry1 ry 2 r12 1 r122 1 r122 Partial correlations may be larger or smaller than the corresponding zero-order correlations. They may even be a different sign. Semi-partial Correlations In partial correlations, the effects of the control variable are removed from both the Y and the other X variable. Semi-partial correlations, however, are used when we only want to remove the effects of the control variable from one other variable; usually the other X variable. This would be useful in answering questions such as: What does X2 contribute to explaining Y over and above what X1 contributes? or What does SES contribute to explaining H over and above what PA contributes? or What is the additional variance accounted for by SES if PA is already in the equation? The notation r2y(2.1) represents the squared semi-partial correlation between X2 (SES) and Y (H) with X1 (PA) partialed out only from X2 (SES). The formula for a semi-partial correlation is: ry (1.2 ) ry1 ry 2 r12 1 r122 Using Venn diagrams: Partial: r2y1.2 H SES PA Semi-partial: r2y(1.2) When to Use Partial and Semi-partial Correlations Semi-partial correlations are used to determine whether a variable explains any additional variance, or whether its contribution to the overall R2 is significant when effect of the other X variables have been taken into account. Partial Correlations are used to determine what the effects of one variable on another are when a third variable has been removed or controlled for. Examples: 1. Does physical activity (PA) have an effect on health (H) that is over and above that of socioeconomic status (SES)? To answer this question, one would calculate the semi-partial correlation: ry (1.2 ) ry1 ry 2 r12 1 r122 Where, y = health, 1 = physical activity, and 2 = socioeconomic status. Therefore ry1, for example, is the simple Pearson product moment correlation between health and physical activity. To gauge the proportion of variance that is accounted for, one would square the result = ry2(1.2) . 2. For participants at the same level of socioeconomic status (SES), is there a correlation between physical activity (PA) and health (H)? To answer this question, one would calculate the partial correlation: ry1.2 ry1 ry 2 r12 1 r122 1 r122 Where, y = health, 1 = physical activity, and 2 = socioeconomic status. Therefore ry1, for example, is the simple Pearson product moment correlation between health and physical activity. To gauge the proportion of variance that is accounted for, one would square the result = ry21.2 . References: 1. Bandalos, Deborah L. EDPS 942: Intermediate Statistics: Correlational Methods. Spring, 1999. 2. Pedhazur, Elazar J. Multiple Regression in Behavioral Research: Explanation and Prediction (3rd Ed). Harcourt Brace College Publishers: Orlando, FL. 1997. p. 156-194.