Correlation and Regression 1 Bivariate data When measurements on two characteristics are to be studied simultaneously because of their interdependence, we get observations in pairs. Such a set of data in pairs is called bivariate data. 2 COVARIANCE While variance measures the variation among the observations in a data set, COVARIANCE measures the joint variation among the pairs of observations in a bivariate data set. i.e. Covariance measures the strength of linear relationship between two or more variables. But it cannot be used to compare the linear relationship between these variables. Hence, there is a necessity to study the concept of correlation. 3 CORRELATION Correlation analysis: When changes in one variable also show changes in the other variable, the two variables are said to be correlated. 4 Correlation Positive Perfect Zero Imperfect Strong Weak Negative Perfect Imperfect Strong Weak 5 Methods of assessing Correlation SCATTER DIAGRAM Scatter diagram is the graphical method of assessing correlation between two variables. 6 PERFECT POSITIVE CORRELATION Y X 7 PERFECT NEGATIVE CORRELATION Y X 8 IMPERFECT POSITIVE CORRELATION Y X 9 IMPERFECT NEGATIVE CORRELATION Y X 10 NO CORRELATION Y X 11 • Correlation is measured with the help of correlation coefficient r. • Its value always lies between -1 and +1 i.e. -1 ≤ r ≤ 1 12 Correlation Positive Correlation 0 < r 1 No Correlation r=0 Negative Correlation -1 < r < 0 Perfect Positive Imperfect Positive Perfect Negative Correlation Correlation Correaltion r=1 Weak Positive r tends to 0 0< r < 1 Strong Positive r tends to 1 r = -1 Weak Negative r tends to 0 Imperfect Negative Correlation -1 < r < 0 Strong Negative r tends to -1 13 Karl Pearson’s Coefficient of correlation: Karl Pearson defined coefficient of correlation as a measure of intensity or degree of linear relationship between two variables. Let X and Y be the two variables with n pairs of observations, then they are represented as: (xi , yi) i = 1, 2, …, n 14 Spurious Correlation: When the value of correlation coefficient shows high presence of significant relationship, but no logical relationship exists between the two variables, such a correlation is called Spurious Correlation. Ex. Number of students getting graduate degree every year and number of auto accidents in the city. 15 Coefficient of Determination The square of the correlation coefficient r, expressed as r2, is known as coefficient of determination. It indicates the extent to which variation in one variable is explained by the variation in other. Ex: If the correlation coefficient between x and y is 0.9, the coefficient of determination will be 0.81. It implies that there is 81% of variation in y explained by the variation in x and the remaining 19% is explained by some other factors. This 1-r2 is referred to as coefficient of nondetermination. The square root of coefficient of nondetermination is known as coefficient of alienation. 16 Rank Correlation Some times the data on two variables cannot be measured quantitatively. In such situations the observations can be ranked. Karl Pearson’s correlation coefficient is not an appropriate measure for qualitative data. Hence Spearman has defined a coefficient of correlation for qualitative data called as Spearman’s Rank Correlation coefficient. E.g. ranks given by judges in a beauty contest. 17 Spearman’s Rank Correlation Coefficient (R) R 1 6 d 2 i nn 1 2 where di = Xi – Yi Xi : Rank assigned by Judge 1 Yi : Rank assigned by Judge 2 n : Number of pairs of observations 18 Case of Tied Ranks A correction factor has to be added to Σdi2 for each tie 2 m m 1 2 6 d i 12 R 1 2 n n 1 where m: number of individuals having a tie 19