Chapter 9: Correlation and Regression Analysis Correlation • Correlation is a numerical way to measure the strength and direction of a linear association between an independent and dependent variable. • The Pearson Correlation Coefficient (r) is the most widely used measure of correlation. • The sign of the correlation coefficient (r) indicates the direction of the association. • The size of the correlation coefficient (r) indicates the strength of the association. Direction • A positive correlation indicates that as one variable increases, the other one increases. – An example might be height and weight. • A negative correlation indicates that as one variable increases, the other one decreases. – An example might be turnovers and wins. • No correlation means that there is no association between the two variables. – An example might be intelligence and the amount of coffee you drink. Strength Examples of positive, linear associations with different amounts of strength. Scatterplots • The graphs on the previous slide are known as scatterplots, or scatter diagrams. • Scatterplots are graphical displays that show the relationship between two numerical variables. • The independent variable is plotted on the x-axis and the dependent variable is plotted on the yaxis. Calculating the Correlation Coefficient • The correlation coefficient is obtained by dividing the sample covariance by the product of the standard deviation of each of the two variables. Example: Construct a scatter diagram and calculate the correlation coefficient for the data set below. Interpretation • So what does an r of -0.918 mean? 1) There is a very strong negative correlation between x and y. 2) Larger values of x tend to correspond to smaller values of y and smaller values of x tend to correspond to larger values of y. Example: Try this example. Find the correlation coefficient of the data set below. The Coefficient of Determination • The coefficient of determination is the square of the correlation coefficient. • The coefficient of determination indicates the percent of the variation in the dependent variable that can be explained by the variation in the independent variable. So what exactly does this mean? So let’s say we randomly selected 16 trucks and looked for the relationship between miles on the truck and price for the truck. • As one would suspect, as miles increase, price decreases. • The coefficient of determination for this data set is 0.664, or 66.4%. • We can interpret this by saying that 66.4% of the variation in price is accounted for by the linear model relating price to miles driven. • Another way to look at this is that 33.6% of a truck’s price is accounted for by factors other than miles. Regression • Regression analysis is a statistical technique for estimating and predicting the value of one variable (dependent variable) on the basis of the knowledge of another variable (independent variable). • The goal is to attain the equation for the line of best fit (aka regression line or least squares line) from the scatterplot of the two variables. Equation of the Regression Line a and b are known as regression coefficients. How to Find the Regression Coefficients Let’s return to a previous example and try and write the regression equation. Now you try with this one. SPSS Output Example