Immaculate Conception - I College of Arts and Technology STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION LESSON 5 Topics: No. of weeks: 2 Matrix and Partial Correlation using SPSS At the end of the lesson, the students will be able to: Explain the procedure of computing correlation matrix and partial correlations using SPSS Emphasize on how to interpret the relationships after computation has been done Matrix Correlation Pearson Product-Moment Correlation The Pearson product-moment correlation coefficient (Pearson’s correlation, for short) is a measure of the strength and direction of association that exists between two variables measured on at least an interval scale. For example, you could use a Pearson’s correlation to understand whether there is an association between exam performance and time spent revising. You could also use a Pearson's correlation to understand whether there is an association between depression and length of unemployment. A Pearson’s correlation attempts to draw a line of best fit through the data of two variables, and the Pearson correlation coefficient, r, indicates how far away all these data points are from this line of best fit (i.e., how well the data points fit this model/line of best fit). Note: If one of your two variables is dichotomous you can use a point-biserial correlation instead, or if you have one or more control variables, you can run a Pearson's partial correlation. When you choose to analyze your data using Pearson’s correlation, part of the process involves checking to make sure that the data you want to analyze can actually be analyzed using Pearson’s correlation. You need to do this because it is only appropriate to use Pearson’s correlation if your data "passes" four assumptions that are required for Pearson’s correlation to give you a valid result. Here are the four assumptions: Assumption #1: Your two variables should be measured at the interval or ratio level (i.e., they are continuous). Assumption #2: There is a linear relationship between your two variables Assumption #3: There should be no significant outliers. Outliers are simply single data points within your data that do not follow the usual pattern (e.g., in a study of 100 students’ IQ scores, where the mean score was 108 with only a small variation between students, one student had a score of 156, which is very unusual, and may even put her in the top 1% of IQ scores globally). Assumption #4: Your variables should be approximately normally distributed. You can check assumptions #2, #3 and #4 using SPSS Statistics. Remember that if you do not test these assumptions correctly, the results you get when running a Pearson's correlation might not be valid. Immaculate Conception - I College of Arts and Technology STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION Example and Data Setup in SPSS Statistics A researcher wants to know whether a person's height is related to how well they perform in a long jump. The researcher recruited untrained individuals from the general population, measured their height and had them perform a long jump. The researcher then investigated whether there was an association between height and long jump performance by running a Pearson's correlation. In SPSS Statistics, we created two variables so that we could enter our data: Height (i.e., participants' height) and Jump_Dist (i.e., distance jumped in a long jump). Test Procedure in SPSS Statistics 1. Click Analyze > Correlate > Bivariate on the menu system, as shown below: 2. Transfer the variables Height and Jump_Dist into the Variables: box by dragging-anddropping them or by clicking on them and then clicking on the a screen similar to the one below: button. You will end up with Immaculate Conception - I College of Arts and Technology STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION 3. Make sure that the Pearson checkbox is selected under the Correlation Coefficients area. 4. Click on the button and you will be presented with the Bivariate Correlations: Options dialogue box. If you wish to generate some descriptives, you can do it here by clicking on the relevant checkbox in the Statistics area. 5. Click on the button, and then click on the button to generate the results. Output for Pearson's correlation The Pearson's correlation result, presented in matrix, is highlighted below: Interpretation: A Pearson product-moment correlation was run to determine the relationship between height and distance jumped in a long jump. Based on the result, there was a strong, positive correlation between height and distance jumped, which was statistically significant (r = .706, n = 14, p = .005). Immaculate Conception - I College of Arts and Technology STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION Partial correlation is a measure of the strength and direction of a linear relationship between two continuous variables at the same time as controlling for the effect of one or more other continuous variables (also known as 'covariates' or 'control' variables). Although partial correlation does not make the distinction between independent and dependent variables, the two variables are often considered in such a manner (i.e., you have one continuous dependent variable and one continuous independent variable, as well as one or more continuous control variables). Note: Many aspects of partial correlation can be dealt with using multiple regression and it is sometimes recommended that this is how you approach your analysis. This is somewhat evident in the SPSS Statistics where you can carry out partial correlation using two different procedures: Correlate and Regression. Example: You could use partial correlation to understand whether there is a linear relationship between ice cream sales and price, even as controlling for daily temperature (i.e., the continuous dependent variable would be "ice cream sales", measured in Php, the continuous independent variable would be "price", also measured in Php, and the single control variable – that is, the single continuous independent variable you are adjusting for – would be daily temperature, measured in °C). You may believe that there is a relationship between ice cream sales and prices (i.e., sales go down as price goes up), but you would like to know if this relationship is affected by daily temperature (e.g., if the relationship changes when taking into account daily temperature since you suspect customers are more willing to buy ice creams, irrespective of price, when it is a really nice, hot day). Before we introduce procedures on how to carry out a partial correlation using SPSS Statistics, we need first to understand the different assumptions that our data must meet in order for a partial correlation to give us a valid result. When we choose to analyze our data using partial correlation, part of the process involves checking to make sure that the data we want to analyze can actually be analyzed using partial correlation. We need to do this because it is only appropriate to use a partial correlation if our data "passes" five assumptions that are required for a partial correlation to give us a valid result. Before we introduce you to these five assumptions, do not be surprised if, when analyzing your own data using SPSS Statistics, one or more of these assumptions is not met. Even when your data fails certain assumptions, there is often a solution to overcome this. First, let’s take a look at these five assumptions: Assumption #1: You have one (dependent) variable and one (independent) variable and these are both measured on a continuous scale (i.e., they are measured on an interval or ratio scale). Examples of continuous variables include revision time (measured in hours), intelligence (measured using IQ score), exam performance (measured from 0 to 100), weight (measured in kg), temperature (measured in °C), sales (measured in US dollars), and so forth. Assumption #2: You have one or more control variables, also known as covariates (i.e., control variables are just variables that you are using to adjust the relationship between the other two variables; that is, your dependent and independent variables). These control variables are also measured on a continuous scale (i.e., they are continuous variables). Examples of continuous variables are provided above. Immaculate Conception - I College of Arts and Technology STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION Assumption #3: There needs to be a linear relationship between all three variables. That is, all possible pairs of variables must show a linear relationship. This is often accomplished by visually inspecting a scatterplot. Assumption #4: There should be no significant outliers. Outliers are simply single data points within your data that do not follow the usual pattern. Partial correlation is sensitive to outliers, which can have a very large effect on the line of best fit and the correlation coefficient, leading to incorrect conclusions regarding your data. Therefore, it is best if there are no outliers or they are kept to a minimum. Assumption #5: Your variables should be approximately normally distributed. In order to assess the statistical significance of the partial correlation, you need to have bivariate normality for each pair of variables, but this assumption is difficult to assess, so a simpler method is more commonly used whereby the distribution for each variable individually is tested. This can be achieved using the Shapiro-Wilk test of normality, which is easily tested for using SPSS Statistics. You can check assumptions #3, #4 and #5 using SPSS Statistics. Remember that if you do not run the statistical tests on these assumptions correctly, the results you get when running a partial correlation might not be valid. Example & Data Setup in SPSS Statistics A researcher wants to know whether there is a statistically significant linear relationship between VO2max (a marker of aerobic fitness) and a person's weight. Furthermore, the researcher wants to know whether this relationship remains after accounting for a person's age (i.e., if the relationship is influenced by a person's age). Therefore, the researcher uses partial correlation to determine whether there is a linear relationship between VO2max and weight, at the same time as controlling for age (i.e., the continuous dependent variable is "VO 2max", measured in ml/min/kg, the continuous independent variable is "weight", measured in kg, and the control variable – that is, the additional continuous independent variable the researcher is adjusting for – is "age", measured in years). In SPSS Statistics, three variables were created so that the data could be entered: VO2max (i.e., the person's VO2max, measured in ml/min/kg), weight (i.e., the person's weight, measured in kg) and age (i.e., the person's age, measured in years). Note: This is a simple example of partial correlation with a single continuous control variable, but you can include multiple control variables in your analysis. Test Procedure in SPSS Statistics The five steps below show you how to analyze your data using a partial correlation in SPSS Statistics when none of the five assumptions have been violated. At the end of these five steps, we show you how to interpret the results from this test. Note: In this example we show you how to use the Correlate procedure in SPSS Statistics, which is very straightforward, but it is also possible to use the Regression procedure, which has a number of advantages. For the purposes of a simple example like the one used in this "quick start" guide, we will use the Correlate procedure. Immaculate Conception - I College of Arts and Technology STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION 1. Click Analyze > Correlate > Partial... on the menu system, as shown below: 2. Transfer the variables weight and VO2max into the Variables: box, and age into the Controlling for: box, by dragging-and-dropping or by clicking the relevant will end up with a screen similar to the one below: buttons. You Immaculate Conception - I College of Arts and Technology STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION 3. Click on the Options screen: button. You will be presented with the following Partial Correlations: 4. Tick the Means and standard deviations and Zero-order correlations checkbox in the – Statistics– area, as shown below: 5. Click on the button, and then click on the button to generate the results. Interpreting the Results of a Partial Correlation SPSS Statistics generates two tables for a partial correlation based on the procedure you ran, as shown below: Immaculate Conception - I College of Arts and Technology STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION The descriptive statistics show that we had no missing data since the recorded sample size, N = 100, is the same as the number of participants that took part in the study. We can also see that the mean value of the dependent variable, VO2max, was 43.63 ml/min/kg (with a standard deviation of 8.57 ml/min/kg), at the same time as the mean weight of participants was 79.7 kg (with a standard deviation of 15.1 kg), and finally, the mean age of participants was 31.1 years (with a standard deviation of 9.1 years). This suggests that the sample of participants was slightly on the younger side rather than representing the population as a whole. The Correlations table is split into two main parts: (a) the Pearson product-moment correlation coefficients for all your variables – that is, your dependent variable, independent variable, and one or more control variables – as highlighted by the blue rectangle; and (b) the results from the partial correlation where the Pearson product-moment correlation coefficient between the dependent and independent variable has been adjusted to take into account the control variable(s), as highlighted by the red rectangle. Note: You can always identify the first part of the Correlations table, which contains the Pearson product-moment correlation coefficients for all your variables because this will be labeled "-none-a" in the far left-hand column of the table. These are also known as zero-order correlations. The second part of the table, which presents results of the partial correlation will contain the label of the control variable in the far left-hand column (i.e., in our example, "Age"). The results of the partial correlation highlighted by the red rectangle show that there was a moderate, negative partial correlation between the dependent variable, "VO2max", and independent variable, "weight", whilst controlling for "age", which was statistically significant (r(97) = -.314, n = 100, p = .002). However, when we refer to the Pearson's product-moment correlation – also known as the zero-order correlation – between "VO2max" and "weight", without controlling for "age", as highlighted by the blue rectangle, we can see that there was also a statistically significant, moderate, Immaculate Conception - I College of Arts and Technology STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION negative correlation between "VO2max" and "weight" (r(98) = -.307, n = 100, p = .002). This suggests that "age" had very little influence in controlling for the relationship between "VO 2max" and "weight". Interpretation: A partial correlation was run to determine the relationship between an individual's VO 2max and weight whilst controlling for age. There was a moderate, negative partial correlation between VO 2max (43.63 ± 8.57 ml/min/kg) and weight (79.66 ± 15.09 kg) whilst controlling for age (31.1 ± 9.1 years), which was statistically significant, r(97) = -.314, N = 100, p = .002. However, zero-order correlations showed that there was a statistically significant, moderate, negative correlation between VO 2max and weight (r(98) = -.307, n = 100, p < .002), indicating that age had very little influence in controlling for the relationship between VO2max and weight. References: https://statistics.laerd.com/spss-tutorials/pearsons-product-moment-correlation-using-spss-statistics.php https://statistics.laerd.com/spss-tutorials/partial-correlation-using-spss-statistics.php https://libguides.library.kent.edu/SPSS/PearsonCorr Activity No. 5 PROBLEM STATEMENT: On the matrix results below, test whether there is a statistically significant linear relationship between two continuous variables, weight and height (and by extension, infer whether the association is significant in the population); and determine the strength and direction of the association.