UNDERSTANDING THE PEARSON CORRELATION

advertisement
PEARSON CORRELATION
UNDERSTANDING THE PEARSON CORRELATION COEFFICIENT (r)
The Pearson product-moment correlation coefficient (r) assesses the degree that quantitative
variables are linearly related in a sample. Each individual or case must have scores on two
quantitative variables (i.e., continuous variables measured on the interval or ratio scales). The
significance test for r evaluates whether there is a linear relationship between the two variables
in the population. The appropriate correlation coefficient depends on the scales of measurement
of the two variables being correlated.
There are two assumptions underlying the significance test associated with a Pearson correlation
coefficient between two variables.
Assumption 1: The variables are bivariately normally distributed.
If the variables are bivariately normally distributed, each variable is normally distributed
ignoring the other variable and each variable is normally distributed at all levels of the
other variable. If the bivariate normality assumption is met, the only type of statistical
relationship that can exist between two variables is a linear relationship. However, if the
assumption is violated, a non-linear relationship may exist. It is important to determine if
a non-linear relationship exists between two variables before describing the results using
the Pearson correlation coefficient. Non-linearity can be assessed visually by examining a
scatterplot of the data points.
Assumption 2: The cases represent a random sample from the population and the scores on
variables for one case are independent of scores on the variables for other cases.
The significance test for a Pearson correlation coefficient is not robust to violations of the
independence assumption. If this assumption is violated, the correlation significance test
should not be computed.
SPSS© computes the Pearson correlation coefficient, an index of effect size. The index ranges in
value from -1.00 to +1.00. This coefficient indicates the degree that low or high scores on one
variable tend to go with low or high scores on another variable. A score on a variable is a low (or
high) score to the extent that it falls below (or above) the mean score on that variable. As with all
effect size indices, there is no good answer to the question, “What value indicates a strong
relationship between two variables?” What is large or small depends on the discipline within
which the research question is being asked.
If one variable is thought of as the predictor and another variable as the criterion, we can square
the correlation coefficient to interpret the strength of the relationship. The square of the
correlation (r2) gives the proportion of criterion variance that is accounted for by its linear
relationship with the predictor. In other words, the square of the correlation coefficient equals the
proportion of the total variance in Y that can be associated with the variance in X. The square of
the correlation coefficient is called the coefficient of determination.
CONDUCTING PEARSON CORRELATION COEFFICIENTS
1. Open the data file
2. Click Analyze
 Correlate
then click Bivariate
You will see the Bivariate Correlations dialog box.
3. Select the variables of interest
a. You can double-click each variable to bring them into the Variables box since there is
only a single option to move to.
b. You can hold down the Ctrl key, click the three desired variables and click ► to
move them to the Variable box.
4. Make sure Pearson is selected in the Correlation Coefficients area.
5. Make sure the Two-tailed option is selected in the Test of Significance box (unless you
have some a priori reason to select one-tailed).
6. Click Options. You will see the Bivariate Correlations: Options dialog box.
7. Click Means and standard deviations in the Statistics box.
8. Click Continue.
9a. (For Total Sample information) click OK.
You should be in the Output1 – SPSS Viewer and are now ready to examine the output.
9b. For determining the information for subgroups (e.g., Male and Female) click Paste.
You should be in the Syntax1 – SPSS Syntax Editor screen – and the requested syntax
should look like the following (based on the Male/Female example with Mathach, Visual,
and Mosaic variables being correlated)
CORRELATIONS
/VARIABLES=mathach visual mosaic
/PRINT=TWOTAIL NOSIG
/STATISTICS DESCRIPTIVES
/MISSING=PAIRWISE .
Provided that you have coded the GENDER variable with 0 = Male and 1 = Females…
Copy the syntax and paste it twice into your syntax editor (leaving at least one empty
space between the syntax commands). Add the following syntax ahead of the second
command:
temporary.
select if (gender eq 0).
CORRELATION
PAGE - 2
Add the following syntax ahead of the third command:
temporary.
select if (gender eq 1).
Your screen should now look like the following:
CORRELATIONS
/VARIABLES=mathach visual mosaic
/PRINT=TWOTAIL NOSIG
/STATISTICS DESCRIPTIVES
/MISSING=PAIRWISE .
temporary.
select if (gender eq 0).
CORRELATIONS
/VARIABLES=mathach visual mosaic
/PRINT=TWOTAIL NOSIG
/STATISTICS DESCRIPTIVES
/MISSING=PAIRWISE .
temporary.
select if (gender eq 1).
CORRELATIONS
/VARIABLES=mathach visual mosaic
/PRINT=TWOTAIL NOSIG
/STATISTICS DESCRIPTIVES
/MISSING=PAIRWISE .
Now you are ready to run the correlation analysis…
Option 1:
Click Run (on the menu bar)
then click All
Option 2:
Highlight all of the desired syntax
then click ►
(or) Click Run, and then click Selection
You should be in the Output1 – SPSS Viewer
Now you are ready to examine the output.
CORRELATION
PAGE - 3
For SPSS version 12.0 (and earlier versions):
SPSS uses a single asterisk (*) to indicate whether a particular correlation is
significant at the .05 level and a double asterisk (**) to indicate whether a particular
correlation is significant at the .01 level. Significance indicates that it is significantly
different from 0 (zero), which is the Null Hypothesis.
Probability (p) values associated with the significance tests (one-tailed or two-tailed
depending on your selection) for the correlations are shown.
Sample size (N) is also shown.
Note that the information in the upper-right triangle (referred to as the Upper
Diagonal, UD) of the matrix is redundant with the information in the lower-left
triangle (referred to as the Lower Diagonal, LD) of the matrix – one can be ignored.
If several correlations are computed, you may wish to consider a corrected significance
level to minimize the chances of making a Type I error. One possible method is the
Bonferroni approach, which requires dividing .05 by the number of computed
correlations. A correlation coefficient would not be significant unless its p value is less
than the corrected significance level. It is largely debated as to when this correction
would be needed and most statistics texts (ours included) do not address the issue, so no
“rule of thumb” will be offered here.
SPSS presents the correlations in tabular form. However, correlations are often presented
within the text of a manuscript. For example, “The correlation between the mathematics
test score and visualization test for male students was significant, r (218) = .422,
p < .001.” The number in the parentheses represents the degrees of freedom associated
with the significance test, which is equal to the number of cases minus 2 (or N – 2). As
shown on the output below, the number of cases for the male group for this correlation is
220 and, therefore, the degrees of freedom are 220 – 2 = 218.
Correlations
MATHACH Mathematics
Test Score
VISUAL Vis ualization
Test
MOSAIC Mosaic Tes t
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
MATHACH
VISUAL
Mathematics
Vis ualization
MOSAIC
Test Score
Test
Mosaic Tes t
1
.422**
.298**
.
.000
.000
220
220
220
.422**
1
.330**
.000
.
.000
220
220
220
.298**
.330**
1
.000
.000
.
220
220
220
**. Correlation is significant at the 0.01 level (2-tailed).
CORRELATION
PAGE - 4
USING SPSS GRAPHS TO DISPLAY CORRELATION RESULTS
Scatterplots are rarely included in results sections of manuscripts, but they should be included
more often because they visually represent the relationship between the variables. While a
correlation coefficient tries to summarize the relationship between two variables with a single
value, a scatterplot gives a rich descriptive picture of this relationship. In addition, the scatterplot
can show whether a few extreme scores (outliers) overly influence the value of the correlation
coefficient or whether non-linear relationships exist between variables.
To create scatterplots among your variables, follow these steps:
1. Click Graphs (on the menu bar)
then click Scatter
You will be at the Scatterplot dialog box
2. Click Simple
then click Define
You will see the Simple Scatterplot dialog box
3. Select your variable for the Y axis and the X axis
(You can add titles by selecting the Title option. Once you have your titles typed in, click
Continue to return to the Simple Scatterplot dialog box.)
then click OK to run (create)
NOTE: If you are wanting to create group specific scatterplots, you will need to click
Paste and adjust your syntax command before running (see above).
4. You can create multiple variable scatterplots by following these steps:
a. Click Graphs
then click Scatter (You will be in the Scatterplot dialog box)
b. Click Matrix
then click Define (You will be in the Scatterplot Matrix dialog box)
c. Holding down the Ctrl key, click the desired variables
d. Click ► to move them to the Matrix Variables box.
e. Click OK.
NOTE: If you are wanting to create group specific scatterplots, you will need to click
Paste and adjust your syntax command before running.
FINAL NOTE: The Bivariate Correlation procedure can also compute the Spearman rho () if
the measurement scales underlying the variables are rankings, which are ordinal data.
CORRELATION
PAGE - 5
Download