Uploaded by Rona Caberos

STAT Lesson5

advertisement
Immaculate Conception - I College of Arts and Technology
STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION
LESSON 5
Topics:
No. of weeks: 2
Matrix and Partial Correlation using SPSS
At the end of the lesson, the students will be able to:
 Explain the procedure of computing correlation matrix and partial correlations using
SPSS
 Emphasize on how to interpret the relationships after computation has been done
Matrix Correlation
Pearson Product-Moment Correlation
 The Pearson product-moment correlation coefficient (Pearson’s correlation, for short) is a
measure of the strength and direction of association that exists between two variables
measured on at least an interval scale.
 For example, you could use a Pearson’s correlation to understand whether there is an
association between exam performance and time spent revising. You could also use a
Pearson's correlation to understand whether there is an association between depression and
length of unemployment.
 A Pearson’s correlation attempts to draw a line of best fit through the data of two variables,
and the Pearson correlation coefficient, r, indicates how far away all these data points are from
this line of best fit (i.e., how well the data points fit this model/line of best fit).
 Note: If one of your two variables is dichotomous you can use a point-biserial
correlation instead, or if you have one or more control variables, you can run a Pearson's
partial correlation.
 When you choose to analyze your data using Pearson’s correlation, part of the process
involves checking to make sure that the data you want to analyze can actually be analyzed
using Pearson’s correlation. You need to do this because it is only appropriate to use
Pearson’s correlation if your data "passes" four assumptions that are required for Pearson’s
correlation to give you a valid result. Here are the four assumptions:
Assumption #1: Your two variables should be measured at the interval or ratio level (i.e., they
are continuous).
Assumption #2: There is a linear relationship between your two variables
Assumption #3: There should be no significant outliers. Outliers are simply single data points
within your data that do not follow the usual pattern (e.g., in a study of 100 students’
IQ scores, where the mean score was 108 with only a small variation between
students, one student had a score of 156, which is very unusual, and may even put
her in the top 1% of IQ scores globally).
Assumption #4: Your variables should be approximately normally distributed.
 You can check assumptions #2, #3 and #4 using SPSS Statistics. Remember that if you do not
test these assumptions correctly, the results you get when running a Pearson's correlation
might not be valid.
Immaculate Conception - I College of Arts and Technology
STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION
Example and Data Setup in SPSS Statistics
A researcher wants to know whether a person's height is related to how well they perform in a
long jump. The researcher recruited untrained individuals from the general population, measured their
height and had them perform a long jump. The researcher then investigated whether there was an
association between height and long jump performance by running a Pearson's correlation.
In SPSS Statistics, we created two variables so that we could enter our data: Height (i.e.,
participants' height) and Jump_Dist (i.e., distance jumped in a long jump).
Test Procedure in SPSS Statistics
1. Click Analyze > Correlate > Bivariate on the menu system, as shown below:
2. Transfer the variables Height and Jump_Dist into the Variables: box by dragging-anddropping them or by clicking on them and then clicking on the
a screen similar to the one below:
button. You will end up with
Immaculate Conception - I College of Arts and Technology
STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION
3. Make sure that the Pearson checkbox is selected under the Correlation Coefficients area.
4. Click on the
button and you will be presented with the Bivariate Correlations:
Options dialogue box. If you wish to generate some descriptives, you can do it here by
clicking on the relevant checkbox in the Statistics area.
5. Click on the
button, and then click on the
button to generate the results.
Output for Pearson's correlation
The Pearson's correlation result, presented in matrix, is highlighted below:
Interpretation:
A Pearson product-moment correlation was run to determine the relationship between height
and distance jumped in a long jump. Based on the result, there was a strong, positive correlation
between height and distance jumped, which was statistically significant (r = .706, n = 14, p = .005).
Immaculate Conception - I College of Arts and Technology
STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION
Partial correlation
 is a measure of the strength and direction of a linear relationship between two continuous
variables at the same time as controlling for the effect of one or more other continuous
variables (also known as 'covariates' or 'control' variables).
 Although partial correlation does not make the distinction between independent and dependent
variables, the two variables are often considered in such a manner (i.e., you have one
continuous dependent variable and one continuous independent variable, as well as one or
more continuous control variables).
 Note: Many aspects of partial correlation can be dealt with using multiple regression and it is
sometimes recommended that this is how you approach your analysis. This is somewhat
evident in the SPSS Statistics where you can carry out partial correlation using two different
procedures: Correlate and Regression.
Example:
You could use partial correlation to understand whether there is a linear relationship between
ice cream sales and price, even as controlling for daily temperature (i.e., the continuous dependent
variable would be "ice cream sales", measured in Php, the continuous independent variable would be
"price", also measured in Php, and the single control variable – that is, the single continuous
independent variable you are adjusting for – would be daily temperature, measured in °C). You may
believe that there is a relationship between ice cream sales and prices (i.e., sales go down as price
goes up), but you would like to know if this relationship is affected by daily temperature (e.g., if the
relationship changes when taking into account daily temperature since you suspect customers are
more willing to buy ice creams, irrespective of price, when it is a really nice, hot day).

Before we introduce procedures on how to carry out a partial correlation using SPSS Statistics,
we need first to understand the different assumptions that our data must meet in order for a
partial correlation to give us a valid result.
 When we choose to analyze our data using partial correlation, part of the process involves
checking to make sure that the data we want to analyze can actually be analyzed using partial
correlation. We need to do this because it is only appropriate to use a partial correlation if our
data "passes" five assumptions that are required for a partial correlation to give us a valid
result.
 Before we introduce you to these five assumptions, do not be surprised if, when analyzing your
own data using SPSS Statistics, one or more of these assumptions is not met. Even when your
data fails certain assumptions, there is often a solution to overcome this. First, let’s take a look
at these five assumptions:
Assumption #1: You have one (dependent) variable and one (independent) variable and these
are both measured on a continuous scale (i.e., they are measured on an interval or ratio scale).
Examples of continuous variables include revision time (measured in hours), intelligence (measured
using IQ score), exam performance (measured from 0 to 100), weight (measured in kg), temperature
(measured in °C), sales (measured in US dollars), and so forth.
Assumption #2: You have one or more control variables, also known as covariates (i.e., control
variables are just variables that you are using to adjust the relationship between the other two
variables; that is, your dependent and independent variables). These control variables are also
measured on a continuous scale (i.e., they are continuous variables). Examples of continuous
variables are provided above.
Immaculate Conception - I College of Arts and Technology
STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION
Assumption #3: There needs to be a linear relationship between all three variables. That is, all
possible pairs of variables must show a linear relationship. This is often accomplished by visually
inspecting a scatterplot.
Assumption #4: There should be no significant outliers. Outliers are simply single data points
within your data that do not follow the usual pattern. Partial correlation is sensitive to outliers, which
can have a very large effect on the line of best fit and the correlation coefficient, leading to incorrect
conclusions regarding your data. Therefore, it is best if there are no outliers or they are kept to a
minimum.
Assumption #5: Your variables should be approximately normally distributed. In order to assess
the statistical significance of the partial correlation, you need to have bivariate normality for each pair
of variables, but this assumption is difficult to assess, so a simpler method is more commonly used
whereby the distribution for each variable individually is tested. This can be achieved using the
Shapiro-Wilk test of normality, which is easily tested for using SPSS Statistics.

You can check assumptions #3, #4 and #5 using SPSS Statistics. Remember that if you do not
run the statistical tests on these assumptions correctly, the results you get when running a
partial correlation might not be valid.
Example & Data Setup in SPSS Statistics
A researcher wants to know whether there is a statistically significant linear relationship
between VO2max (a marker of aerobic fitness) and a person's weight. Furthermore, the researcher
wants to know whether this relationship remains after accounting for a person's age (i.e., if the
relationship is influenced by a person's age). Therefore, the researcher uses partial correlation to
determine whether there is a linear relationship between VO2max and weight, at the same time as
controlling for age (i.e., the continuous dependent variable is "VO 2max", measured in ml/min/kg, the
continuous independent variable is "weight", measured in kg, and the control variable – that is, the
additional continuous independent variable the researcher is adjusting for – is "age", measured in
years).
In SPSS Statistics, three variables were created so that the data could be
entered: VO2max (i.e., the person's VO2max, measured in ml/min/kg), weight (i.e., the person's
weight, measured in kg) and age (i.e., the person's age, measured in years).
Note: This is a simple example of partial correlation with a single continuous control variable, but you
can include multiple control variables in your analysis.
Test Procedure in SPSS Statistics
The five steps below show you how to analyze your data using a partial correlation in SPSS
Statistics when none of the five assumptions have been violated. At the end of these five steps, we
show you how to interpret the results from this test.
Note: In this example we show you how to use the Correlate procedure in SPSS Statistics, which is
very straightforward, but it is also possible to use the Regression procedure, which has a number of
advantages. For the purposes of a simple example like the one used in this "quick start" guide, we
will use the Correlate procedure.
Immaculate Conception - I College of Arts and Technology
STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION
1. Click Analyze > Correlate > Partial... on the menu system, as shown below:
2. Transfer the variables weight and VO2max into the Variables: box, and age into
the Controlling for: box, by dragging-and-dropping or by clicking the relevant
will end up with a screen similar to the one below:
buttons. You
Immaculate Conception - I College of Arts and Technology
STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION
3. Click on the
Options screen:
button. You will be presented with the following Partial Correlations:
4. Tick the Means and standard deviations and Zero-order correlations checkbox in the –
Statistics– area, as shown below:
5. Click on the
button, and then click on the
button to generate the results.
Interpreting the Results of a Partial Correlation
SPSS Statistics generates two tables for a partial correlation based on the procedure you ran,
as shown below:
Immaculate Conception - I College of Arts and Technology
STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION
The descriptive statistics show that we had no missing data since the recorded sample
size, N = 100, is the same as the number of participants that took part in the study. We can also see
that the mean value of the dependent variable, VO2max, was 43.63 ml/min/kg (with a standard
deviation of 8.57 ml/min/kg), at the same time as the mean weight of participants was 79.7 kg (with a
standard deviation of 15.1 kg), and finally, the mean age of participants was 31.1 years (with a
standard deviation of 9.1 years). This suggests that the sample of participants was slightly on the
younger side rather than representing the population as a whole.
The Correlations table is split into two main parts: (a) the Pearson product-moment
correlation coefficients for all your variables – that is, your dependent variable, independent variable,
and one or more control variables – as highlighted by the blue rectangle; and (b) the results from the
partial correlation where the Pearson product-moment correlation coefficient between the dependent
and independent variable has been adjusted to take into account the control variable(s), as
highlighted by the red rectangle.
Note: You can always identify the first part of the Correlations table, which contains the Pearson
product-moment correlation coefficients for all your variables because this will be labeled "-none-a"
in the far left-hand column of the table. These are also known as zero-order correlations. The
second part of the table, which presents results of the partial correlation will contain the label of the
control variable in the far left-hand column (i.e., in our example, "Age").
The results of the partial correlation highlighted by the red rectangle show that there was a
moderate, negative partial correlation between the dependent variable, "VO2max", and independent
variable, "weight", whilst controlling for "age", which was statistically significant (r(97) = -.314, n =
100, p = .002). However, when we refer to the Pearson's product-moment correlation – also known as
the zero-order correlation – between "VO2max" and "weight", without controlling for "age", as
highlighted by the blue rectangle, we can see that there was also a statistically significant, moderate,
Immaculate Conception - I College of Arts and Technology
STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION
negative correlation between "VO2max" and "weight" (r(98) = -.307, n = 100, p = .002). This suggests
that "age" had very little influence in controlling for the relationship between "VO 2max" and "weight".
Interpretation:
A partial correlation was run to determine the relationship between an individual's VO 2max and
weight whilst controlling for age. There was a moderate, negative partial correlation between VO 2max
(43.63 ± 8.57 ml/min/kg) and weight (79.66 ± 15.09 kg) whilst controlling for age (31.1 ± 9.1 years),
which was statistically significant, r(97) = -.314, N = 100, p = .002. However, zero-order correlations
showed that there was a statistically significant, moderate, negative correlation between VO 2max and
weight (r(98) = -.307, n = 100, p < .002), indicating that age had very little influence in controlling for
the relationship between VO2max and weight.
References:
https://statistics.laerd.com/spss-tutorials/pearsons-product-moment-correlation-using-spss-statistics.php
https://statistics.laerd.com/spss-tutorials/partial-correlation-using-spss-statistics.php
https://libguides.library.kent.edu/SPSS/PearsonCorr
Activity No. 5
PROBLEM STATEMENT: On the matrix results below, test whether there is a statistically significant
linear relationship between two continuous variables, weight and height (and by extension, infer
whether the association is significant in the population); and determine the strength and direction of
the association.
Download