PEARSON CORRELATION UNDERSTANDING THE PEARSON CORRELATION COEFFICIENT (r) The Pearson product-moment correlation coefficient (r) assesses the degree that quantitative variables are linearly related in a sample. Each individual or case must have scores on two quantitative variables (i.e., continuous variables measured on the interval or ratio scales). The significance test for r evaluates whether there is a linear relationship between the two variables in the population. The appropriate correlation coefficient depends on the scales of measurement of the two variables being correlated. There are two assumptions underlying the significance test associated with a Pearson correlation coefficient between two variables. Assumption 1: The variables are bivariately normally distributed. If the variables are bivariately normally distributed, each variable is normally distributed ignoring the other variable and each variable is normally distributed at all levels of the other variable. If the bivariate normality assumption is met, the only type of statistical relationship that can exist between two variables is a linear relationship. However, if the assumption is violated, a non-linear relationship may exist. It is important to determine if a non-linear relationship exists between two variables before describing the results using the Pearson correlation coefficient. Non-linearity can be assessed visually by examining a scatterplot of the data points. Assumption 2: The cases represent a random sample from the population and the scores on variables for one case are independent of scores on the variables for other cases. The significance test for a Pearson correlation coefficient is not robust to violations of the independence assumption. If this assumption is violated, the correlation significance test should not be computed. SPSS© computes the Pearson correlation coefficient, an index of effect size. The index ranges in value from -1.00 to +1.00. This coefficient indicates the degree that low or high scores on one variable tend to go with low or high scores on another variable. A score on a variable is a low (or high) score to the extent that it falls below (or above) the mean score on that variable. As with all effect size indices, there is no good answer to the question, “What value indicates a strong relationship between two variables?” What is large or small depends on the discipline within which the research question is being asked. If one variable is thought of as the predictor and another variable as the criterion, we can square the correlation coefficient to interpret the strength of the relationship. The square of the correlation (r2) gives the proportion of criterion variance that is accounted for by its linear relationship with the predictor. In other words, the square of the correlation coefficient equals the proportion of the total variance in Y that can be associated with the variance in X. The square of the correlation coefficient is called the coefficient of determination. CONDUCTING PEARSON CORRELATION COEFFICIENTS 1. Open the data file 2. Click Analyze Correlate then click Bivariate You will see the Bivariate Correlations dialog box. 3. Select the variables of interest a. You can double-click each variable to bring them into the Variables box since there is only a single option to move to. b. You can hold down the Ctrl key, click the three desired variables and click ► to move them to the Variable box. 4. Make sure Pearson is selected in the Correlation Coefficients area. 5. Make sure the Two-tailed option is selected in the Test of Significance box (unless you have some a priori reason to select one-tailed). 6. Click Options. You will see the Bivariate Correlations: Options dialog box. 7. Click Means and standard deviations in the Statistics box. 8. Click Continue. 9a. (For Total Sample information) click OK. You should be in the Output1 – SPSS Viewer and are now ready to examine the output. 9b. For determining the information for subgroups (e.g., Male and Female) click Paste. You should be in the Syntax1 – SPSS Syntax Editor screen – and the requested syntax should look like the following (based on the Male/Female example with Mathach, Visual, and Mosaic variables being correlated) CORRELATIONS /VARIABLES=mathach visual mosaic /PRINT=TWOTAIL NOSIG /STATISTICS DESCRIPTIVES /MISSING=PAIRWISE . Provided that you have coded the GENDER variable with 0 = Male and 1 = Females… Copy the syntax and paste it twice into your syntax editor (leaving at least one empty space between the syntax commands). Add the following syntax ahead of the second command: temporary. select if (gender eq 0). CORRELATION PAGE - 2 Add the following syntax ahead of the third command: temporary. select if (gender eq 1). Your screen should now look like the following: CORRELATIONS /VARIABLES=mathach visual mosaic /PRINT=TWOTAIL NOSIG /STATISTICS DESCRIPTIVES /MISSING=PAIRWISE . temporary. select if (gender eq 0). CORRELATIONS /VARIABLES=mathach visual mosaic /PRINT=TWOTAIL NOSIG /STATISTICS DESCRIPTIVES /MISSING=PAIRWISE . temporary. select if (gender eq 1). CORRELATIONS /VARIABLES=mathach visual mosaic /PRINT=TWOTAIL NOSIG /STATISTICS DESCRIPTIVES /MISSING=PAIRWISE . Now you are ready to run the correlation analysis… Option 1: Click Run (on the menu bar) then click All Option 2: Highlight all of the desired syntax then click ► (or) Click Run, and then click Selection You should be in the Output1 – SPSS Viewer Now you are ready to examine the output. CORRELATION PAGE - 3 For SPSS version 12.0 (and earlier versions): SPSS uses a single asterisk (*) to indicate whether a particular correlation is significant at the .05 level and a double asterisk (**) to indicate whether a particular correlation is significant at the .01 level. Significance indicates that it is significantly different from 0 (zero), which is the Null Hypothesis. Probability (p) values associated with the significance tests (one-tailed or two-tailed depending on your selection) for the correlations are shown. Sample size (N) is also shown. Note that the information in the upper-right triangle (referred to as the Upper Diagonal, UD) of the matrix is redundant with the information in the lower-left triangle (referred to as the Lower Diagonal, LD) of the matrix – one can be ignored. If several correlations are computed, you may wish to consider a corrected significance level to minimize the chances of making a Type I error. One possible method is the Bonferroni approach, which requires dividing .05 by the number of computed correlations. A correlation coefficient would not be significant unless its p value is less than the corrected significance level. It is largely debated as to when this correction would be needed and most statistics texts (ours included) do not address the issue, so no “rule of thumb” will be offered here. SPSS presents the correlations in tabular form. However, correlations are often presented within the text of a manuscript. For example, “The correlation between the mathematics test score and visualization test for male students was significant, r (218) = .422, p < .001.” The number in the parentheses represents the degrees of freedom associated with the significance test, which is equal to the number of cases minus 2 (or N – 2). As shown on the output below, the number of cases for the male group for this correlation is 220 and, therefore, the degrees of freedom are 220 – 2 = 218. Correlations MATHACH Mathematics Test Score VISUAL Vis ualization Test MOSAIC Mosaic Tes t Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N MATHACH VISUAL Mathematics Vis ualization MOSAIC Test Score Test Mosaic Tes t 1 .422** .298** . .000 .000 220 220 220 .422** 1 .330** .000 . .000 220 220 220 .298** .330** 1 .000 .000 . 220 220 220 **. Correlation is significant at the 0.01 level (2-tailed). CORRELATION PAGE - 4 USING SPSS GRAPHS TO DISPLAY CORRELATION RESULTS Scatterplots are rarely included in results sections of manuscripts, but they should be included more often because they visually represent the relationship between the variables. While a correlation coefficient tries to summarize the relationship between two variables with a single value, a scatterplot gives a rich descriptive picture of this relationship. In addition, the scatterplot can show whether a few extreme scores (outliers) overly influence the value of the correlation coefficient or whether non-linear relationships exist between variables. To create scatterplots among your variables, follow these steps: 1. Click Graphs (on the menu bar) then click Scatter You will be at the Scatterplot dialog box 2. Click Simple then click Define You will see the Simple Scatterplot dialog box 3. Select your variable for the Y axis and the X axis (You can add titles by selecting the Title option. Once you have your titles typed in, click Continue to return to the Simple Scatterplot dialog box.) then click OK to run (create) NOTE: If you are wanting to create group specific scatterplots, you will need to click Paste and adjust your syntax command before running (see above). 4. You can create multiple variable scatterplots by following these steps: a. Click Graphs then click Scatter (You will be in the Scatterplot dialog box) b. Click Matrix then click Define (You will be in the Scatterplot Matrix dialog box) c. Holding down the Ctrl key, click the desired variables d. Click ► to move them to the Matrix Variables box. e. Click OK. NOTE: If you are wanting to create group specific scatterplots, you will need to click Paste and adjust your syntax command before running. FINAL NOTE: The Bivariate Correlation procedure can also compute the Spearman rho () if the measurement scales underlying the variables are rankings, which are ordinal data. CORRELATION PAGE - 5