Correlation • Correlation: the mathematical extent to which two variables are related to each other – Correlation refers to both a type of research design and a descriptive statistical procedure. – Generally performed between two scores obtained from the same source Correlation Coefficient • Correlation Coefficient: number between +1 and -1 that represents the strength and direction of the relationship between two variables • Correlations that are closer to +1 and –1 are stronger and are better able to accurately predict Types of Correlation Coefficients • Pearson r: both variables are measured at an interval/ratio level • Spearman rho: used when the measurement of at least one variable is ordinal (scores on the other variable must be converted to ranks) Positive Correlations • Positive Correlation: a correlation that is a greater than zero, but less than +1 • Indicates that high scores on one variable are associated with high scores on another variable • The values of the variables increase and decrease together. Negative Correlations • Negative Correlation: a correlation coefficient whose value is between 0 and -1 • Indicates that there is an inverse relationship between the two sets of scores • A high score on X is related to a low score on Y, and vice versa Linear Relationships Freshman GPA • Linear Relationship: a condition wherein the relationship between two variables can be best described by a straight line (the regression line or the line of best fit) 4.0 3.5 3.0 2.5 2.0 1.5 1.0 300 400 500 600 SAT Score 700 800 Scatterplots • Scatterplot: provides a visual representation of the relationship between variables • Each point represents paired measurements on two variables for a specific individual Understanding the Pearson Product Moment Correlation Coefficient • Pearson r: represents the extent to which individuals occupy the same relative position in two distributions • Definitional Equation: Σz x z y r = N • Important Reminder: – Σz2 = N Interpreting the Correlation Coefficient • Coefficient of Determination (r2): the proportion of variance in one variable that can be described or explained by the other variable • Coefficient of Nondetermination (1 - r2): the proportion of variance in one variable that cannot be described or explained by the other variable Correlation Matrices • Tables of correlations are generated when more than two variables are involved. • A Correlation Matrix is a table in which each variable is listed both at the top and at the left side, and the correlation of all possible pairs of variables is shown inside the table • An asterisk identifies significant correlations. Correlations Fres hman GPA Hours Worked per Week Pears on Correlation Sig. (2-tailed) N Pears on Correlation Sig. (2-tailed) N Hours Fres hman Worked GPA per Week 1 -.693** . .004 15 15 -.693** 1 .004 . 15 15 **. Correlation is s ignificant at the 0.01 level (2-tailed). Caution: Spurious Correlations • Spurious Correlations: a correlation coefficient that is artificially high or low because of the nature of the data or method for collecting the data • Common Causes of Spurious Correlations: – – – – – – A nonlinear relationship Truncated range Sample Size Outliers Multiple Populations Extreme Scores Caution: No Causality • • Correlations only tell us that two variables are related; they do not determine causality Four Possible Explanations: 1. X Y (Temporal Directionality) 2. Y X (Temporal Directionality) 3. X Y (Bidirectional Causation) 4. Z X and Y (Third Variable Problem) Computing the Correlation Coefficient Using SPSS • Analyze Correlate Bivariate • Select variables to be correlated in the left side of the Bivariate Correlations window and move them to the right side • Select the appropriate correlation coefficient • Check two tailed and flag significant correlations click OK Interpreting the Output Correlations Freshman GPA SAT Score Hours Studied per Week Hours Worked per Week Pears on Correlation Sig. (2-tailed) N Pears on Correlation Sig. (2-tailed) N Pears on Correlation Sig. (2-tailed) N Pears on Correlation Sig. (2-tailed) N Hours Hours Freshman Studied Worked GPA SAT Score per Week per Week 1 .685** .548* -.693** . .005 .034 .004 15 15 15 15 .685** 1 .041 -.612* .005 . .884 .015 15 15 15 15 .548* .041 1 -.398 .034 .884 . .142 15 15 15 15 -.693** -.612* -.398 1 .004 .015 .142 . 15 15 15 15 **. Correlation is s ignificant at the 0.01 level (2-tailed). *. Correlation is s ignificant at the 0.05 level (2-tailed). Creating a Scatterplot • • • • • • • • Graphs Scatter Click Simple Click Define Move the criterion variable to the Y axis box Move the predictor variable to the X axis box Click OK Double-click on the chart to edit it. Click Fit Line at Total. Click OK Reading Scatterplots 4.0 4.0 3.5 3.5 3.0 Freshman GPA 3.0 2.5 2.5 2.0 2.0 1.5 1.5 1.0 -10 1.0 0 10 20 Hours Worked per Week 30 40 0 10 20 Hours Studied per Week 30 40 Linear Regression • An important use of the correlation coefficient is the ability to predict one set of scores from another. • If we know the score on one variable, we can use that score to predict someone’s score on the correlated variable. The Regression Line 4.0 Freshman GPA • Line of Best Fit: minimizes the distance between each individual point and the regression line 3.5 3.0 2.5 2.0 1.5 1.0 300 400 500 600 SAT Score 700 800 The Regression Equation • Equation: Y’ = aY + bY(X) • Where Y’ = the predicted score of Y based on a known value of X aY = the intercept of the regression line bY = the slope of the line X = the score being used as the predictor In English Please… • Slope: how much variable Y changes as the values of variable X change one unit • Intercept: the value of variable Y when X = 0 • Predictor Variable: the variable X which is used to predict the score on variable Y (antecedent or independent variable) • Criterion Variable: the variable that is predicted (dependent variable) Linear Regression Using SPSS • Analyze Regression Linear • Click on the criterion variable and move it to the Dependent box • Click on the predictor variable and move ot to the Independent(s) box • Click Statistics check Descriptives make sure that Estimates and Model fit are also selected • Click Continue • Click OK Interpreting the Output ANOVAb Model 1 Regress ion Res idual Total Sum of Squares 2.862 6.674 9.536 df 1 13 14 Mean Square 2.862 .513 F 5.575 Sig. .034 a a. Predictors : (Constant), Hours Studied per Week • The F value in the ANOVA box indicates whether the predictor variable was a b. Dependent Variable: Fres hman GPA significant predictor of the criterion variable. Coefficientsa Model 1 (Cons tant) Hours Studied per Week Uns tandardized Coefficients B Std. Error 1.735 .395 .060 .025 Standardized Coefficients Beta .548 t 4.388 2.361 Sig. .001 .034 a. Dependent Variable: Fres hman GPA • The unstandardized coefficient for the constant reflects the Y intercept of the regression equation. •The unstandardized coefficient for the predictor variable reflects the slope of the line. •The regression equation for this example would be Y’ = 1.735 + .06X