Correlation Slide 1 This PowerPoint presentation provides a very simple introduction to correlation. Unfortunately, the term ‘correlation’ often is used incorrectly in everyday English. The point of this lecture is to ensure you understand correlation as a technical term. Slide 2 This slide illustrates one important point: it’s impossible to calculate correlation among nominally scaled items. Although it’s possible to calculate a correlation on ordinal data, correlations typically are calculated on metric data, which also is called interval and ratio scaled data. So, correlating two variables typically involves two variables measured on an interval or ratio scale. Slide 3 Here’s the definition of correlation coefficient: a statistical measure of the co-variation and association between two variables. A correlation coefficient for two variables, X and Y, is denoted by the symbol in the middle of the slide. A sample question that might be answered using the correlation coefficient is ‘Are dollar sales associated with advertising dollar expenditures?’ To answer that question in the affirmative, you’d need to show the correlation coefficient differs from zero (0). Slide 4 The confusion between correlation and causation is caused by the common usage of the terms ‘correlation’ or ‘correlated’. Here are a few examples that will illustrate how correlation does not mean causation; in other words, high correlations with no causations. The rooster crowing at dawn does not mean the rooster causes the sun to rise. Teachers’ salaries and liquor consumption tend to be positively correlated. An increase in salaries with an increase in liquor consumption doesn’t suggest that reducing teachers’ salaries will reduce their liquor consumption; rather, there’s a third variable— the economy—that co-varies and influences these two variables. When the economy is booming, teachers’ salaries rise, so they have more disposable income and can better afford to consume more liquor. It’s not that raising teachers’ salaries causes increased liquor consumption; instead, both teachers’ salaries and liquor consumption is correlated with a third variable, the state of the economy. Slide 5 One great thing about the correlation coefficient is that it’s standardized; its value ranges from +1 to -1. If the correlation coefficient of +1, then there’s a perfect positive linear relationship, and if it’s -1, then there’s a perfect negative linear relationship. A correlation coefficient of 0 means that X and Y are unrelated and there’s no correlation. The strength of a relationship between two variables is reflected by the proximity of the correlation coefficient to either +1 or – 1. Page | 1 Slide 6 These eight patterns provide a concise summary for the linear nature of correlations. Many marketing relationships are not linear, so this is important. (Fortunately, researchers have statistical means for converting non-linear relationships into linear ones.) Graph A shows a correlation coefficient of +1; X and Y are positively correlated in a total linear fashion. Graph B shows that X and Y are linearly related, but in a negative way. Graph C shows points that are aligned, but not perfectly, so the correlation coefficient is close to but not 1. In this case, X’s value provides a good predictor of Y’s value of Y. In contrast, for both graphs A and B, X’s value provides a perfect predictor of Y’s value. Graph D shows a correlation coefficient close to -1, with X as a good but not perfect predictor of Y. One could draw a straight line through the points in Graph D that would summarize the relationship between X and Y. Graph E shows a positive but close to 0 correlation between X and Y. Graph F shows a closeto-zero negative correlation between X and Y. The issue of linearity comes with play with Graphs G and H. For Graph G, knowing X’s value doesn’t help to predict Y’s value; as a result, X and Y are uncorrelated or have a coefficient equal to zero (0). The same is true of Graph H; the correlation coefficient also is zero (0). However, you may recall from algebra that Graph H depicts a parabolic function; X’s value perfectly predicts Y’s value. Hence, there’s a powerful relationship between X and Y, but the correlation coefficient is 0. Correlation can indicate whether or not two variables are related in a linear or straight-line way; it can’t detect the type of non-linear relationships shown in Graph H. Slide 7 For those of you who like formulae, here’s a formula for a simple correlation coefficient. Slide 8 This slide shows that the correlation between X and Y are related to their variability. The numerator is the covariance between X and Y; the degree to which they vary simultaneously. The denominator is the square root of the variance between X and Y. If the numerator indicates a large amount of variation, it could be because of huge magnitudes; perhaps X and Y are in millions of units. Hence, a small percent change could be a large unit change. By presenting the correlation coefficient, as opposed to a covariance, we standardize the covariation measure so that it ranges from +1 to -1. Consider correlations as systematic variation between the X variable and the Y variable: How does X vary as Y varies? Slide 9 Perhaps even more useful from a managerial perspective is the coefficient of determination, which is the correlation coefficient multiplied by itself. Recall the correlation coefficient can range from -1 to +1. As a result, the coefficient of determination ranges from 0 to 1 because multiplying a negative number by a negative number produces a positive number. The coefficient of determination is useful because it indicates the power of one variable to predict changes in another variable. For example, if I’m interested in predicting changes in Y and I know that changes in X tend to be highly predictive of changes in Y, in the sense that much of the variability in Y can be explained statistically by variability in X, then I know that X is a good predictor of Y if much of the explained variance in Y is explained by variance in X. Page | 2 Slide 10 Correlation examines the relationship of a single variable to another single variable. Marketers often are interested in studying a set of those relationships. They might want to know how variable #1 relates to variable #2, how variable #1 relates to variable #3, how variable #2 relates to variable #3, and so on. A correlation matrix provides a quick way to summarize all those correlations. This matrix also is important because it’s frequently used as a first step in multivariate statistical analyses (like factor analysis). Slide 11 This slide shows a correlation matrix. This matrix shows that variable #1 correlates perfectly with itself, as does variable #2 and variable #3; hence, the 1’s on the diagonal. The off-diagonal elements indicate to what degree each variable correlates with each other variable. Notice that this matrix is symmetric, in the sense that the correlation between variable #1 and variable #3 is identical to the correlation between variable #3 and variable #1. Slide 12 (No Audio) Slide 13 I include this final slide to remind you it’s possible to use non-parametric statistics to compute correlations between ordinally scaled measures. Although not typically used in marketing research, such statistics are available in SPSS. If you compute correlations, then be certain that you first check whether or not your data is metric (intervally or ratio scaled) or ordinal. Otherwise, you’ll be applying the wrong statistical method. Page | 3