Figure 15-3 (p. 512) Examples of positive and negative relationships. (a) Beer sales are positively related to temperature. (b) Coffee sales are negatively related to temperature. Figure 15-4 (p. 513) Examples of different values for linear correlations: (a) shows a strong positive relationship, approximately +0.90; (b) shows a relatively weak negative correlation, approximately –0.40; (c) shows a perfect negative correlation, –1.00; (d) shows no linear trend, 0.00. The Pearson Correlation The Pearson correlation “r” measures the direction and degree of linear (straight line) relationship between two variables. The magnitude of the Pearson correlation ranges from 0 (indicating no linear relationship between X and Y) to 1.00 (indicating a perfect straight-line relationship between X and Y). The correlation can be either positive or negative depending on the direction of the relationship. The Pearson Correlation • r = degree to which X and Y vary together divided by degree to which X and Y vary separately • The Pearson correlation compares the amount of • • Covariability; variation from the relationship between X and Y to the amount X and Y vary separately • • If there is a perfect linear relationship • • every change in X is matched by a change in the Y variable see fig 15.4a which illustrates a perfect negative correlation • • When X goes up one unit Y goes down one unit When X goes up two units Y goes down two units • So X and Y covary The Pearson Correlation • To compute the Pearson correlation • • calculate the variability of X and Y scores separately by computing SS for the scores of each variable SSX and SSY Calculate Covariability which is the sum of products of deviation scores SP = S (X-Mx)(Y-My) • The Pearson correlation is found by computing the ratio of SP compared to square root of the SSxSSy • r = SP/(SSX)(SSY) . X 1 2 3 4 5 Y 1 2 3 4 5 Mean 3 3 SS 10 10 X -M -2 -1 0 1 2 100 Y-M -2 -1 0 1 2 product 4 1 0 1 4 SP ---> 10 √SSxSSy 10 r ----> 1.00 Excel file for generating a perfect correlation The Pearson Correlation Calculations Example 15.2 Calculating SP from definitional formula SP = S (X-Mx)(Y-My) Using squared deviation table p 515 Calculation of Pearson correlation r = SP / √ (ssx)(ssy) r = 6 / √ (ssx)(ssy) r = 6/ √ (10)(10) r = 0.60 Note: SS columns are not in the textbook X M=3 Y X-Mx Y-My (X-Mx)2 Products (Y-My)2 1 3 -2 -2 +4 4 4 2 6 -1 +1 -1 1 1 4 4 +1 -1 -1 1 1 5 7 +2 +2 +4 4 4 M=5 SP = 6 SSx= 10 SSy= 10 The Pearson Correlation Calculations Example 15.2 X X2 Y 1 3 1 Y2 XY 9 SP Using Computational formula SP = SXY – (SXSY / n) SP = 66 - [12(20)] /4 = 6 3 2 6 4 36 12 4 4 16 16 16 5 7 25 49 35 12 20 46 110 66 SS Using Computational formula SSx = SX2 – (SX)2 /n SSx = 46 – (12) 2 / 4 = 10 SSy = SY2 – (SY)2 /n SSy = 110 – (20)2 /4 = 10 Calculation of Pearson correlation r = SP / √ (ssx)(ssy) r = 6/ √ (10)(10) r = 0.60 Calculating Sum of Products (SP) Example 15.3 Using definitional formula table 15.1 page 518 X 0 10 Y 2 6 4 8 8 2 4 6 r = SP/(SSX)(SSY) r = 28/ (64)(16) r = 28/32 = +0.875 Figure 15.5 (p. 517) Scatter plot of data from Example 15.3 Time For More Fun With SPSS Using and Interpreting The Pearson Correlation • Predictions: • • knowing the relationship between SAT and GPA makes it possible to use SAT to predict GPA • Validity: • • comparing two tests of the same construct such as “anxiety” if they have high correlation their is construct validity • Reliability: • Test – Retest reliability • Theory Verification: • • When a theory makes a prediction about the relationship between two variables they can be tested with correlation Amount of sleep is positively related to GPA Interpreting Correlations • Correlations describe relationships • • • but do not explain why they exist can not draw cause and effect conclusions However causation is not ruled out either • Cigarette smoking is positively correlated with cancer • Correlations are sensitive to the range of scores • Correlations are sensitive to outliers • Correlations are not proportions • • size of the r value is not directly related to strength of the relationship use r2 to interpret strength of the relationship • Correlations describe relationships • • but do not explain why they exist can not draw cause and effect conclusions Figure 15-6 (p. 522) Hypothetical data showing the logical relationship between the number of churches and the number of serious crimes for a sample of U.S. cities. • Correlations are sensitive to the range of scores Problem of Restricted Range Figure 15-7 (p. 523) In this example, the green ellipse, when the full range of X and Y values are used there is a strong, positive correlation. However, the brown circle, when the X values have a restricted range of scores the correlation is near zero. • Correlations are sensitive to outliers Problem of Outliers Figure 15-8 (p. 524) A demonstration of how one extreme data point (an outlier) can influence the value of a correlation. Correlation and Strength of the Relationship • Coefficient of Determination r2 – Using correlation for prediction • Using SAT to predict GPA • Based on degree of the relationship • r value is not a good measure for predictions – r2 measures the proportion of variability in one variable that can be determined by the other variable • Small, Medium, Large see table 9.3 • Used as a measure of effect size for t test • Amount of variance in the dependent explained by the independent Figure 15.9 (p. 525) Three sets of data showing three different degrees of linear relationships. Calculations for Pearson Correlation Coefficient • Definitional Formula – – r = SP / √ (ssx)(ssy) SP = S (X-Mx)(Y-My) • Computational Formula – – r = SP / √ (ssx)(ssy) SP = SXY - SXSY / n • z – score formula (for samples) – r = Szxzy / n-1 The Spearman Correlation • The Spearman correlation is used in two general situations: – (1) X and Y both consist of ranks • Because it measures the relationship between two ordinal variables – (2)When the relationship is non linear • the two variables must be converted to ranks before the Spearman correlation is computed • Because it measures the consistency of direction of the relationship between two variables. Examples of relationships that are not linear: (a) relationship between reaction time and age (b) relationship between mood and drug dose. Relationship between practice and performance. There is a consistent positive relationship. Fig. 15-14, p. 536 The Spearman Correlation (cont.) The calculation of the Spearman correlation requires: 1. Two variables are observed for each individual. 2. The observations for each variable are rank ordered. Note that the X values and the Y values are ranked separately. 3. After the variables have been ranked, the Spearman correlation is computed by either: a. Using the Pearson formula with the ranked data. b. Using the special Spearman formula assuming there are few, if any, tied ranks 15.3 15.3 15.9 Performance Performance 15.9 Practice Practice The Spearman Correlation Formulas and Calculations Original Ranks X 1 2 3 4 5 Data X Y 3 12 4 10 10 11 11 9 12 2 Sum 15 Y 5 3 4 2 1 XY 5 6 12 8 5 15 36 X2 1 4 9 16 25 55 •Example 15.10 Use the ranks for calculations •SP = SXY – (SXSY / n) using computational formula •SP = 36 - [15(15)] /5 = -9 •SSx = SX2 – (SX)2 /n using computational formula •SSx = 55 – (15) 2 / 5 = 10 • •SSy = SY2 – (SY)2 /n using computational formula •SSy = 55 – (15)2 /5 = 10 •rs = SP / √ (SSx)(SSy) •rs = -9 / √ (10)(10) = -0.90 Y2 25 9 16 4 1 55 Scatter plots of original scores and ranks for Example 15.10 The Spearman Correlation Formulas and Calculations •After the variables have been ranked •Spearman correlation is computed by either: – a. Using the Pearson formula with the ranked data – b. Using the special Spearman formula • assuming there are few, if any, tied ranks •Example 15.10 Always do the calculations on the ranks – – – – rs = 1 - 6SD2 /n(n2-1) using special formula rs = 1 - 6(38) / 5(25-1) = -0.90 But not if there are tied scores You are not responsible for this formula on the exam Original Data X Y 3 12 4 10 10 11 11 9 12 2 Ranks Sum X Y D D2 1 5 4 16 2 3 1 1 3 4 1 1 4 2 -2 4 5 1 -4 16 38 Ranking Tied Scores Example from page 545 Score 3 3 5 6 6 6 12 Initial Rank 1 2 3 4 5 6 7 Final Rank 1.5 1.5 3 5 5 5 7 Use the Pearson correlation equation on the ranked scores point-biserial correlation as an alternative to the Pearson Correlation • The Pearson correlation formula can also be used to measure the relationship between two variables when one or both of the variables is dichotomous. • A dichotomous variable is one for which there are exactly two categories: for example, men/women or succeed/fail. • The point-biserial correlation is used in situations where one variable is dichotomous and the other consists of regular numerical scores ;interval or ratio scale point-biserial correlation as an alternative to the Pearson Correlation • The calculation of the point-biserial correlation proceeds as follows: – Assign numerical values to the two categories of the dichotomous variable(s). Traditionally, one category is assigned a value of 0 and the other is assigned a value of 1. – Use the regular Pearson correlation formula to calculate the correlation. point-biserial correlation as an alternative to the Pearson Correlation • The point-biserial correlation is closely related to the independent-measures t test introduced in Chapter 10. • When the data consists of one dichotomous variable and one numerical variable, the dichotomous variable can also be used to separate the individuals into two groups. • Then, it is possible to compute a sample mean for the numerical scores in each group. point-biserial correlation as an alternative to the Pearson Correlation • In this case, the independent-measures t test can be used to evaluate the mean difference between groups. • If the effect size for the mean difference is measured by computing r2 (the percentage of variance explained), the value of r2 will be equal to the value obtained by squaring the point-biserial correlation. phi-coefficient as an alternatives to the Pearson Correlation • The phi-coefficient is used when both variables are dichotomous. • The calculation proceeds as follows: – Convert each of the dichotomous variables to numerical values by assigning a 0 to one category and a 1 to the other category for each of the variables. – Use the regular Pearson formula with the converted scores. phi-coefficient as an alternatives to the Pearson Correlation phi-coefficient as an alternatives to the Pearson Correlation