S519: Evaluation of Information Systems Social Statistics Ch5: Correlation This week What is correlation? How to compute? How to interpret? Correlation Coefficients The relations between two variables How the value of one variable changes when the value of another variable changes A correlation coefficient is a numerical index to reflect the relationship between two variables. Range: -1 ~ +1 Bivariate correlation (for two variables) Correlation Coefficients Parametric Pearson product-moment correlation (named for inventor Karl Pearson) Non-parametric Spearman’s rank correlation Kendall tau rank correlation coefficient Pearson correlation coefficient For two variables which are continuous in nature Height, age, test score, income But not for discrete or categorical variables Race, political affiliation, social class, rank Rxy is the correlation between variable X and variable Y Types of correlation coefficients Direct correlation (positive correlation): If both variables change in the same direction Indirect correlation (negative correlation): If both variables change in opposite directions See table 5.1 (S-p112) -0.70 and +0.5, which is stronger? Pearson product-moment correlation coefficient rxy rxy n X Y XY X2 Y2 n XY X Y [n X 2 ( X ) 2 ][n Y 2 ( Y ) 2 ] The correlation coefficient between X and Y the size of the sample the individual’s score on the X variable the individual’s score on the Y variable the product of each X score times its corresponding Y score the individual X score, squared the individual Y score, squared Exercise Calculate Pearson correlation coefficient X Y 2 4 5 6 4 7 8 5 6 7 1.Is variable X and variable Y correlated? 2. What does this correlated mean? 3 2 6 5 3 6 5 4 4 5 Using Excel to calculate CORREL function Or Pearson function Visualizing a correlation Scatterplot or scattergram X Y 2 4 5 6 4 7 8 5 6 7 Y X 3 2 6 5 3 6 5 4 4 5 Visualizing a correlation 7 Y 6 5 4 3 2 1 0 0 2 4 6 8 X 10 Direct (positive) correlation 9 8 7 6 5 4 3 2 1 0 0 2 4 6 8 10 r =1, a perfect direct (or positive) correlation In real life case, 0.7 and 0.8 could be the highest you will see Indirect (or negative) correlation 9 8 7 6 5 4 3 2 1 0 0 2 4 6 8 10 Strength and direction are important Excel Scatterplot Four sets of data with the same correlation of 0.816 Linear correlation Linear correlation means that X and Y are in one straight line Curvlilinear correlation Age and memory More than 2 variables? income How to calculate the correlation coefficient? education 74190 80931 81314 73089 62023 61217 84526 87251 62659 76450 70512 78858 78628 86212 74962 58828 61471 78621 60071 attitude 13 12 11 11 11 10 11 11 12 10 12 9 13 14 9 11 10 12 9 vote 1 3 4 5 3 4 5 4 5 6 7 6 7 8 8 9 8 7 8 1 2 2 2 2 2 1 1 2 2 2 1 1 2 2 4 5 5 4 1. CORREL() 2. Correlation in data analysis toolset More than 2 variables? Correlation matrix Income Education Attitude Vote Income Education Attitude Vote 1.00 0.35 -0.19 0.51 1.00 -0.21 0.43 1.00 0.55 1.00 Excel Data Analysis tool - correlation Meaning of Correlation coefficient Correlation value: - finite number ~ + finite number Correlation coefficient value: -1.00 ~ +1.00 rxy value Interpretation 0.8 ~ 1.0 Very strong relationship (share most of the things in common) 0.6 ~0.8 Strong relationship (share many things in common) 0.4 ~ 0.6 Moderate relationship (share something in common) 0.2 ~ 0.4 Weak relationship (share a little in common) 0.0 ~ 0.2 Weak or no relationship (share very little or nothing in common) Coefficient of determination Coefficient of determination: The percentage of variance in one variable that is accounted for by the variance in the other variable. = square of coefficient rGPA.Time 0.70 2 GPA .Time r 0.49 49% of the variance in GPA can be explained by the variance in studying time Coefficient of nondetermination The amount of unexplained variance is called the coefficient of undetermination (coefficient of alienation) correlation determination 0 0 0.5 0.25 0.9 0.81 interpretation Ice cream and crime In a small town in Greece, The local police found the direct correlation between ice cream and crime Correlation vs. causality The correlation represents the association between two or more variables It has nothing to do with causality (there is no cause relation between two correlated variables) Ices cream and crime are correlated, but Ices cream does not cause crime