Basic Statistics Correlation Var Relationships Var Var Associations Var Var The Need for a Measure of Relationship Control Describe INDIVIDUAL DIFFERENCES (Variance) Predict Explain In Research Information Dependent variable X1 X2 X3 Independent variables COvary ?Y The Concept of Correlation Association or relationship between two variables Co-relate? r relation X Y Covary---Go together Patterns of Covariation Patterns of Covariation X X Y Positive correlation Y Correlation Covary Go together Zero or no correlation X Y Negative correlation Scatter plots allow us to visualize the relationships The chief purpose of the scatter diagram is to study the nature of the relationship between two variables Linear/curvilinear relationship Direction of relationship Magnitude (size) of relationship Scatter Plots Scatter Plot A high Represents both the X and Y scores Variable Y Exact value low low Variable X high An illustration of a perfect positive correlation Scatter Plot B high Variable Y Estimated Y value low low Variable X high An illustration of a positive correlation Scatter Plot C high Variable Y Exact value low low Variable X high An illustration of a perfect negative correlation Scatter Plot D high Variable Y Estimated Y value low low Variable X high An illustration of a negative correlation Scatter Plot E high Variable Y low low Variable X high An illustration of a zero correlation Scatter Plot F high Variable Y low low Variable X high An illustration of a curvilinear relationship The Measurement of Correlation The Correlation Coefficient The degree of correlation between two variables can be described by such terms as “strong,” ”low,” ”positive,” or “moderate,” but these terms are not very precise. If a correlation coefficient is computed between two sets of scores, the relationship can be described more accurately. A statistical summary of the degree and direction of relationship or association between two variables can be computed Pearson’s Product-Moment Correlation Coefficient r r XY ( X)( Y) n 2 2 ( X) ( Y) 2 2 X Y n n No Relationship Negative correlation -1.00 -.50 Positive correlation 0 + .50 1.00 Direction of relationship: Sign (+ or –) Magnitude: 0 through +1 or 0 through -1 The Pearson Product-Moment Correlation Coefficient Recall that the formula for a variance is: S 2 Σ XX n1 2 Σ XX XX n1 If we replaced the second X that was squared with a second variable, Y, it would be: S x y Σ XX YY n1 This is called a co-variance and is an index of the relationship between X and Y. Conceptual Formula for Pearson r n r (X i X )(Y i Y ) i1 n (X i1 n i X) 2 (Y i Y) 2 i1 This formula may be rewritten to reflect the actual method of calculation Calculation of Pearson r r XY ( X)( Y) n 2 2 ( X) ( Y) 2 2 X Y n n You should notice that this formula is merely the sum of squares for covariance divided by the square root of the product of the sum of squares for X and Y Formulae for Sums of Squares SSx X 2 SSy Y 2 X 2 n Y SSxy XY 2 n X Y n Therefore, the formula for calculating r may be rewritten as: Calculation of r Using Sums of Squares r SSxy SSx SSy An Example Suppose that a college statistics professor is interested in how the number of hours that a student spends studying is related to how many errors students make on the midterm examination. To determine the relationship the professor collects the following data: The Stats Professor’s Data Student Hours Studied (X) X2 Errors (Y) Y2 XY 1 4 15 16 225 60 2 4 12 16 144 48 3 5 9 25 81 45 4 6 10 36 100 60 5 7 8 49 64 56 6 7 4 49 16 28 7 7 6 49 36 42 8 9 2 81 4 18 9 9 4 81 16 36 10 12 3 100 9 36 X = 70 Y = 73 X2 =546 Y2=695 Total XY=429 The Data Needed to Calculate the Sum of Squares X X2 Y2 XY X = 70 Y = 73 X2 =546 Y2=695 XY=429 Total SSx X 2 SSy Y 2 SSxy XY Y X 2 n Y 2 n X Y n = 546 - 702/10 = 546 - 490 = 56 = 695 - 732/10 = 695 - 523.9 = 162.1 = 429 – (70)(73)/10 = 429 – 511 = -82 Calculating the Correlation Coefficient r SSxy SSx SSy = -82 / √(56)(162.1) = - 0.86 Thus, the correlation between hours studied and errors made on the mid-term examination is -0.86; indicating that more time spend studying is related to fewer errors on the mid-term examination. Hopefully an obvious, but now a statistical conclusion! Pearson Product-Moment Correlation Coefficient r r XY ( X)( Y) n 2 2 ( X) ( Y) 2 2 X Y n n perfect negative correlation Zero correlation -1 0 Negative correlation Perfect positive correlation +1 Positive correlation r XY ( X)( Y) n 2 2 ( X) ( Y) 2 2 X Y n n -.73 .35 0 values Numerical Negative correlation Perfect Zero correlation Strong Positive correlation Moderate The Pearson r and Marginal Distribution The marginal distribution of X is simply the distribution of the X’s; the marginal distribution of Y is the frequency distribution of the Y’s. Y Bivariate relationship Bivariate Normal Distribution X Marginal distribution of X and Y are precisely the same shape. Y variable X variable Interpreting r, the Correlation Coefficient Recall that r includes two types of information: The direction of the relationship (+ or -) The magnitude of the relationship (0 to 1) However, there is a more precise way to use the correlation coefficient, r, to interpret the magnitude of a relationship. That is, the square of the correlation coefficient or r2. The square of r tells us what proportion of the variance of Y can be explained by X or vice versa. Suppose you wish to estimate Y for a given value of X. high How does correlation explain variance? Explained Variable Y Free to Vary 49% of variance is explained Explained low low Variable X high An illustration of how the squared correlation accounts for variance in X, r = .7, r2 = .49 Now, lets look at some correlation coefficients and their corresponding scatter plots. 120000 100000 80000 60000 C u r re n t S a la ry 40000 20000 0 0 10000 20000 30000 40000 50000 60000 70000 Beg inni ng Sa lary What is your estimate of r? r = .87 r2 = .76 = 76% 120000 Y 100000 80000 60000 C u r re n t S a la ry 40000 20000 0 0 10000 20000 30000 40000 Beg inni ng Sa lary 50000 60000 70000 X What is your estimate of r? r = -1.00 r2 = 1.00 = 100% 120000 Y 100000 80000 60000 C u r re n t S a la ry 40000 20000 0 0 10000 20000 30000 40000 Beg inni ng Sa lary 50000 60000 70000 X What is your estimate of r? r = +1.00 r2 = 1.00 = 100% 70000 60000 50000 40000 B e g in n in g S a la r y 30000 20000 10000 0 60 70 80 90 100 Mo nth s si nce Hire What is your estimate of r? r = .04 r2 = .002 = .2% 6000 5000 4000 3000 2000 1000 0 10 20 30 Time to Accele rate from 0 t o 60 mp h (sec) What is your estimate of r? r = -.44 r2 = .19 = 19% Pearson r assumes that we are using interval or ratio data. What do we do if one or both of the variables we measured at the ordinal level? If we replace the scores with ranks, we can use the same formula. However, it can be simplified if we are using ordinal data. It is called a Spearman Rank-Order Correlation Coefficient. Spearman’s Rank Order Correlation As noted, the Spearman rs is a special case of the Pearson r (when the data are ordinal). The formula, derived from the Pearson, is as follows: 2 rS 1 6 di n(n 1) 2 where d i X i Yi The characteristics and interpretation of a Spearman rs are exactly the same as a Pearson r. That is, rS ranges from -1 to +1, and the square provides an estimate of the shared variance. Spearman Rank Order Correlation Coefficient One or both of the variables are in the form of ranks. Raw data may be converted to ranks, or ranks may be gathered as the original data. Example Illustrated Calculation N=4 X Y 1 2 4 3 2 1 4 3 d= X–Y -1 1 0 0 2 d2 1 1 0 0 d 2 2 6( d ) rS 1 rS 1 n(n 1) 2 6 2 4(4 rS 1 2 1) 12 60 rS 1 .20 .80 Choosing Between Pearson and Spearman • If the data are ordinal, we have no choice, we have to use Spearman. • If the data are interval or ratio, we do have a choice. – Pearson is more sensitive – Spearman easier to compute by hand Summary of Measures of Relationship There are other correlation coefficients for other levels of measurement. However, we will only study three, the two we have already reviewed and later, one more for nominal data. Spearman Rank Correlation Coefficient S r The Biserial Correlation Coefficient rb The Point-Biserial Correlation Coefficient r p b The Phi Correlation Coefficient The Tetrachoric Correlation Coefficient rt The Rank-Biserial Correlation Coefficient rrb Summarizing Correlations • Pearson and Spearman Correlation Coefficients range from -1.0 to + 1.0 • Pearson and Spearman Correlation Coefficients indicate both direction and magnitude of the relationship • Correlation does NOT imply Causation