Correlation

advertisement
Correlation
Slide 1
This PowerPoint presentation provides a very simple introduction to correlation. Unfortunately,
the term ‘correlation’ often is used incorrectly in everyday English. The point of this lecture is to
ensure you understand correlation as a technical term.
Slide 2
This slide illustrates one important point: it’s impossible to calculate correlation among nominally
scaled items. Although it’s possible to calculate a correlation on ordinal data, correlations
typically are calculated on metric data, which also is called interval and ratio scaled data. So,
correlating two variables typically involves two variables measured on an interval or ratio scale.
Slide 3
Here’s the definition of correlation coefficient: a statistical measure of the co-variation and
association between two variables. A correlation coefficient for two variables, X and Y, is
denoted by the symbol in the middle of the slide. A sample question that might be answered
using the correlation coefficient is ‘Are dollar sales associated with advertising dollar
expenditures?’ To answer that question in the affirmative, you’d need to show the correlation
coefficient differs from zero (0).
Slide 4
The confusion between correlation and causation is caused by the common usage of the terms
‘correlation’ or ‘correlated’. Here are a few examples that will illustrate how correlation does not
mean causation; in other words, high correlations with no causations.

The rooster crowing at dawn does not mean the rooster causes the sun to rise.

Teachers’ salaries and liquor consumption tend to be positively correlated. An increase
in salaries with an increase in liquor consumption doesn’t suggest that reducing
teachers’ salaries will reduce their liquor consumption; rather, there’s a third variable—
the economy—that co-varies and influences these two variables. When the economy is
booming, teachers’ salaries rise, so they have more disposable income and can better
afford to consume more liquor. It’s not that raising teachers’ salaries causes increased
liquor consumption; instead, both teachers’ salaries and liquor consumption is correlated
with a third variable, the state of the economy.
Slide 5
One great thing about the correlation coefficient is that it’s standardized; its value ranges from
+1 to -1. If the correlation coefficient of +1, then there’s a perfect positive linear relationship, and
if it’s -1, then there’s a perfect negative linear relationship. A correlation coefficient of 0 means
that X and Y are unrelated and there’s no correlation. The strength of a relationship between
two variables is reflected by the proximity of the correlation coefficient to either +1 or – 1.
Page | 1
Slide 6
These eight patterns provide a concise summary for the linear nature of correlations. Many
marketing relationships are not linear, so this is important. (Fortunately, researchers have
statistical means for converting non-linear relationships into linear ones.)
Graph A shows a correlation coefficient of +1; X and Y are positively correlated in a total linear
fashion. Graph B shows that X and Y are linearly related, but in a negative way. Graph C shows
points that are aligned, but not perfectly, so the correlation coefficient is close to but not 1. In
this case, X’s value provides a good predictor of Y’s value of Y. In contrast, for both graphs A
and B, X’s value provides a perfect predictor of Y’s value. Graph D shows a correlation
coefficient close to -1, with X as a good but not perfect predictor of Y. One could draw a straight
line through the points in Graph D that would summarize the relationship between X and Y.
Graph E shows a positive but close to 0 correlation between X and Y. Graph F shows a closeto-zero negative correlation between X and Y.
The issue of linearity comes with play with Graphs G and H. For Graph G, knowing X’s value
doesn’t help to predict Y’s value; as a result, X and Y are uncorrelated or have a coefficient
equal to zero (0). The same is true of Graph H; the correlation coefficient also is zero (0).
However, you may recall from algebra that Graph H depicts a parabolic function; X’s value
perfectly predicts Y’s value. Hence, there’s a powerful relationship between X and Y, but the
correlation coefficient is 0. Correlation can indicate whether or not two variables are related in a
linear or straight-line way; it can’t detect the type of non-linear relationships shown in Graph H.
Slide 7
For those of you who like formulae, here’s a formula for a simple correlation coefficient.
Slide 8
This slide shows that the correlation between X and Y are related to their variability. The
numerator is the covariance between X and Y; the degree to which they vary simultaneously.
The denominator is the square root of the variance between X and Y. If the numerator indicates
a large amount of variation, it could be because of huge magnitudes; perhaps X and Y are in
millions of units. Hence, a small percent change could be a large unit change. By presenting the
correlation coefficient, as opposed to a covariance, we standardize the covariation measure so
that it ranges from +1 to -1. Consider correlations as systematic variation between the X
variable and the Y variable: How does X vary as Y varies?
Slide 9
Perhaps even more useful from a managerial perspective is the coefficient of determination,
which is the correlation coefficient multiplied by itself. Recall the correlation coefficient can
range from -1 to +1. As a result, the coefficient of determination ranges from 0 to 1 because
multiplying a negative number by a negative number produces a positive number. The
coefficient of determination is useful because it indicates the power of one variable to predict
changes in another variable. For example, if I’m interested in predicting changes in Y and I
know that changes in X tend to be highly predictive of changes in Y, in the sense that much of
the variability in Y can be explained statistically by variability in X, then I know that X is a good
predictor of Y if much of the explained variance in Y is explained by variance in X.
Page | 2
Slide 10
Correlation examines the relationship of a single variable to another single variable. Marketers
often are interested in studying a set of those relationships. They might want to know how
variable #1 relates to variable #2, how variable #1 relates to variable #3, how variable #2 relates
to variable #3, and so on. A correlation matrix provides a quick way to summarize all those
correlations. This matrix also is important because it’s frequently used as a first step in
multivariate statistical analyses (like factor analysis).
Slide 11
This slide shows a correlation matrix. This matrix shows that variable #1 correlates perfectly
with itself, as does variable #2 and variable #3; hence, the 1’s on the diagonal. The off-diagonal
elements indicate to what degree each variable correlates with each other variable. Notice that
this matrix is symmetric, in the sense that the correlation between variable #1 and variable #3 is
identical to the correlation between variable #3 and variable #1.
Slide 12 (No Audio)
Slide 13
I include this final slide to remind you it’s possible to use non-parametric statistics to compute
correlations between ordinally scaled measures. Although not typically used in marketing
research, such statistics are available in SPSS. If you compute correlations, then be certain that
you first check whether or not your data is metric (intervally or ratio scaled) or ordinal.
Otherwise, you’ll be applying the wrong statistical method.
Page | 3
Download