Correlations MEASURES OF RELATIONSHIP Key Concepts Pearson Correlation interpretation limits computation graphing Factors that affect the Pearson Correlation Coefficient of Determination (r2) – ‘variance explained’ Correlation vs. Causation Correlations A correlation measures a linear relationship between two variables Correlation: Scatterplots Scatterplots are graphic representations of the relationship between two continuous variables 120 Weight 100 80 60 40 20 0 0 2 4 6 Age 8 10 12 Correlation: Coefficients Correlation coefficients are number between -1.00 and +1.00 representing the relationship between two variables -1 0 +1 Stop and think What types of variables are correlated in education? Can you provide some examples of both positive and negative relationships? The Ugly Formula …the variance formula for r r ( X X )(Y Y ) ( X X ) (Y Y ) 2 2 This formula calculates the correlation between X and Y It builds on your knowledge of variance; showing how the variation in X & Y along with the covariation between X & Y make up the Pearson correlation coefficient. Example Age X 7 4 9 3 5 4 6 10 10 Weight Y 70 50 100 25 55 40 75 90 25 Step 1: Layout the Problem Example Step 2: Compute the Mean for both variables Age X 7 4 9 3 5 4 6 10 10 Sum of X = 58 Number of X = 9 Mean of X = 6.44 Weight Y 70 50 100 25 55 40 75 90 25 Sum of Y = 530 Number of Y = 9 Mean of Y = 58.89 Step 3: Compute the difference of each score from its Mean X 7 4 9 3 5 4 6 10 10 Age X-Xbar .56 -2.44 2.56 -3.44 1.44 -2.44 -.44 3.56 3.56 Mean of X = 6.44 Weight Y Y-Ybar 70 11.11 50 -8.89 100 41.11 25 -33.89 55 -3.89 40 -18.89 75 16.11 90 31.11 25 -33.89 Mean of Y = 58.89 Note: The sum of (X-Xbar) should equal 0 and the sum of (Y-Ybar) should equal 0. Why? Step 4: Compute the square of each mean difference X 7 4 9 3 5 4 6 10 10 Age X-Xbar (X-Xbar)2 .56 .3136 -2.44 5.9536 2.56 6.5536 -3.44 11.8336 1.44 2.0736 -2.44 5.9536 -.44 .1936 3.56 12.6736 3.56 12.6736 Y 70 50 100 25 55 40 75 90 25 Weight Y-Ybar (Y-Ybar)2 11.11 1234.4321 -8.89 79.0321 41.11 1690.0321 -33.89 1148.5321 -3.89 15.1321 -18.89 356.8321 16.11 259.5321 31.11 967.8321 -33.89 1148.5321 Step 5: Sum the squares differences from the means X 7 4 9 3 5 4 6 10 10 Age 2 X-Xbar (X-Xbar) .56 .3136 -2.44 5.9536 2.56 6.5536 -3.44 11.8336 1.44 2.0736 -2.44 5.9536 -.44 .1936 3.56 12.6736 3.56 12.6736 Sum (X-Xbar)2 = 58.22 Y 70 50 100 25 55 40 75 90 25 Weight 2 Y-Ybar (Y-Ybar) 11.11 1234.4321 -8.89 79.0321 41.11 1690.0321 -33.89 1148.5321 -3.89 15.1321 -18.89 356.8321 16.11 259.5321 31.11 967.8321 -33.89 1148.5321 Sum (Y-Ybar)2 = 5788.89 Step 6: Compute the cross-product of the differences (for the numerator) X 7 4 9 3 5 4 6 10 10 Age X-Xbar (X-Xbar)2 .56 .3136 -2.44 5.9536 2.56 6.5536 -3.44 11.8336 1.44 2.0736 -2.44 5.9536 -.44 .1936 3.56 12.6736 3.56 12.6736 Y 70 50 100 25 55 40 75 90 25 Weight Y-Ybar (Y-Ybar)2 (X-Xbar)(Y-Ybar) 11.11 1234.4321 6.2216 -8.89 79.0321 21.6916 41.11 1690.0321 105.2416 -33.89 1148.5321 116.5816 -3.89 15.1321 5.6016 -18.89 356.8321 46.0916 16.11 259.5321 -7.0884 31.11 967.8321 110.7516 -33.89 1148.5321 -120.6484 Step 7: Sum the cross product of the differences X 7 4 9 3 5 4 6 10 10 Age 2 X-Xbar (X-Xbar) .56 .3136 -2.44 5.9536 2.56 6.5536 -3.44 11.8336 1.44 2.0736 -2.44 5.9536 -.44 .1936 3.56 12.6736 3.56 12.6736 Y 70 50 100 25 55 40 75 90 25 Weight 2 Y-Ybar (Y-Ybar) (X-Xbar)(Y-Ybar) 11.11 123.4321 6.2216 -8.89 79.0321 21.6916 41.11 1690.0321 105.2416 -33.89 1148.5321 116.5816 -3.89 15.1321 5.6016 -18.89 356.8321 46.0916 16.11 259.5321 -7.0884 31.11 967.8321 110.7516 -33.89 1148.5321 -120.6484 Sum (X-Xbar)(Y-Ybar) = 284.4444 Step 8: Collect the partial values together, and substitute each into the formula. Solve the formula. r ( X X )(Y Y ) ( X X ) (Y Y ) 2 r 2 Sum (X-Xbar)2= 58.22 Sum (Y-Ybar)2= 5788.89 Sum (X-Xbar)(Y-Ybar) = 284.44 284.44 .49 (58.22)(5788.89) Last Step: Check the computed r for reasonableness, then interpret the value (sign and magnitude) The value of r must be between -1 and +1 Computed r = .49, which is between -1 and +1 The sign of r is positive The relationship among the two variables is positive “In general, younger people weigh less than older people.” “In general, older people weigh more than younger people.” The magnitude of r is “moderate” Although age and weight are related, the relationship is not very strong. Some of the variation in age has nothing to do with weight, and some of the variation in weight has nothing to do with age. Cautions Variables with a curvilinear relationship will be underestimated if r is applied. Size of the group does not affect the size of the correlation coefficient. Effect Size ES = the correlation coefficient, squared (r2) The proportion of the total variance of one variable that can be associated with the variance in the other variable. It is the proportion of shared or common variance between two variables. CAL Example: WEIGHT calorie intake & weight r2 = .36 r = .60 r2 = .36 or 36% Correlation & Causality Correlation does not indicate causation correlation indicates a relationship or association Practice Compute the Pearson correlation and r squared value for the following example. Be sure to try to draw a rough sketch of a scatterplot to see if the relationship looks linear. X 3, 7, 8, 2, 5 Y 5, 8, 10, 3, 9 Interpret your results. x y 3 5 7 8 8 10 2 3 5 9 X-Xbar r Y-Ybar (X-Xbar)2 (Y-Ybar) 2 (X-Xbar)(Y-Ybar) - numerator ( X X )(Y Y ) ( X X ) (Y Y ) 2 2 x y 3 5 7 8 8 10 2 3 5 9 Xbar=5 Ybar=7 X-Xbar r Y-Ybar (X-Xbar)2 (Y-Ybar) 2 (X-Xbar)(Y-Ybar) ( X X )(Y Y ) ( X X ) (Y Y ) 2 2 x y X-Xbar Y-Ybar 3 5 -2 -2 7 8 2 1 8 10 3 3 2 3 -3 -4 5 9 0 -2 Xbar=5 Ybar=7 Check=0 Check=0 r (X-Xbar)2 (Y-Ybar) 2 (X-Xbar)(Y-Ybar) ( X X )(Y Y ) ( X X ) (Y Y ) 2 2 x y X-Xbar Y-Ybar (X-Xbar)2 (Y-Ybar) 2 3 5 -2 -2 4 4 7 8 2 1 4 1 8 10 3 3 9 9 2 3 -3 -4 9 16 5 9 0 -2 0 4 Xbar=5 Ybar=7 Check=0 Check=0 Sum=26 Sum=34 r (X-Xbar)(Y-Ybar) ( X X )(Y Y ) ( X X ) (Y Y ) 2 2 x y X-Xbar Y-Ybar (X-Xbar)2 (Y-Ybar) 2 (X-Xbar)(Y-Ybar) 3 5 -2 -2 4 4 4 7 8 2 1 4 1 2 8 10 3 3 9 9 9 2 3 -3 -4 9 16 12 5 9 0 -2 0 4 0 Xbar=5 Ybar=7 Check=0 Check=0 Sum=26 Sum=34 Sum=27 r ( X X )(Y Y ) ( X X ) (Y Y ) 2 2 x y X-Xbar Y-Ybar (X-Xbar)2 (Y-Ybar) 2 (X-Xbar)(Y-Ybar) 3 5 -2 -2 4 4 4 7 8 2 1 4 1 2 8 10 3 3 9 9 9 2 3 -3 -4 9 16 12 5 9 0 -2 0 4 0 Xbar=5 Ybar=7 Check=0 Check=0 Sum=26 Sum=34 Sum=27 r ( X X )(Y Y ) ( X X ) (Y Y ) 2 2 r = 27 / sqrt((26)(34)) r = 27 / 29.7 r = .908, r2 = .82 Key Points Correlation is a measure of relationship, and ranges from -1 to 1. Sign indicates direction, and the coefficient indicates strength of relationship. r2 represents the shared variance Correlations do not imply causality