Correlation Analysis • Correlation Analysis is the study of the relationship between variables. It is also defined as group of techniques to measure the association between two variables. • A Scatter Diagram is a chart that portrays the relationship between the two variables. It is the usual first step in correlations analysis – The Dependent Variable is the variable being predicted or estimated. – The Independent Variable provides the basis for estimation. It is the predictor variable. 1 The Coefficient of Correlation, r The Coefficient of Correlation (r) is a measure of the strength of the relationship between two variables. It requires interval or ratio-scaled data. • It can range from -1.00 to 1.00. • Values of -1.00 or 1.00 indicate perfect and strong correlation. • Values close to 0.0 indicate weak correlation. • Negative values indicate an inverse relationship and positive values indicate a direct relationship. 2 Correlation(Contd.) Types of correlation: 1. Positive Correlation: Positive correlation occurs when an increase in one variable 2. 3. 4. 5. 6. increases the value in another. Negative Correlation: Negative correlation occurs when an increase in one variable decreases the value of another. No Correlation: No correlation occurs when there is no linear dependency between the variables. Partial correlation: It measures the strength of a relationship between two variables, while controlling for the effect of one or more other variables. Linear Correlation: Correlation is said to be linear if the ratio of change is constant. When the amount of output in a factory is doubled by doubling the number of workers, this is an example of linear correlation. Non Linear Correlation: Correlation is said to be non linear if the ratio of change is not constant. For example, doubling the rainfall won’t harvest twice crops. Perfect Correlation 4 Correlation Coefficient - Interpretation 5 Correlation I. II. III. Methods of coefficient of correlation: Karl Pearson’s coefficient of correlation Spearman’s rank of coefficient of correlation Standard error of coefficient of correlation Karl Pearson’s coefficient of correlation: Coefficient of Determination • The coefficient of determination (r2) is the proportion of the total variation in the dependent variable (Y) that is explained or accounted for by the variation in the independent variable (X). It is the square of the coefficient of correlation. • It ranges from 0 to 1. • It does not give any information on the direction of the relationship between the variables. 7 Correlation Coefficient - Example Using the Copier Sales of America data compute the correlation coefficient and coefficient of determination. 8 Correlation Coefficient - Example How do we interpret a correlation of 0.759? First, it is positive, so we see there is a direct relationship between the number of sales calls and the number of copiers sold. The value of 0.759 is fairly close to 1.00, so we conclude that the association is strong. However, does this mean that more sales calls cause more sales? No, we have not demonstrated cause and effect here, only that the two variables—sales calls and copiers sold—are related. 9 Coefficient of Determination (r2) - Example •The coefficient of determination, r2 ,is 0.576, found by (0.759)2 •This is a proportion or a percent; we can say that 57.6 percent of the variation in the number of copiers sold is explained, or accounted for, by the variation in the number of sales calls. 10 Lag and Lead in correlation I In correlation of time series the investigator may find there is a gap before a cause and effect relationship is established. For example, the supply of a commodity may increase today, but it may not have an immediate effect on prices- it may take few days or even months for prices to adjust to the increased supply The difference in the period before a cause and effect relationship is established is called lag. Ignoring this time gap produce fallacious conclusions. The pairing of items is adjusted according to the time lag. Lag and Lead in correlation • Pran juice is studying the effect of its latest advertising campaign. People chosen at random are called and asked how many juice they had bought in the past week and how many advertisement they have either seen in the past week. Months ; Jan Feb Mar Apr May Jun Jul Aug Number of ads : 3 7 4 2 0 4 1 2 juice purchased: 11 18 9 4 7 6 3 8 • • Develop the estimating equation that best fits the data. • Calculate the standard error of the estimate. • Allowing two months time lag calculate coefficient of correlation Calculate the sample coefficient of determination. Interpret the result. • Testing the Significance of the Correlation Coefficient H0: = 0 (the correlation in the population is 0) H1: ≠ 0 (the correlation in the population is not 0) Reject H0 if: t > t/2,n-2 or t < -t/2,n-2 13 Testing the Significance of the Correlation Coefficient - Example H0: = 0 (the correlation in the population is 0) H1: ≠ 0 (the correlation in the population is not 0) Reject H0 if: t > t/2,n-2 or t < -t/2,n-2 t > t0.025,8 or t < -t0.025,8 t > 2.306 or t < -2.306 14 Testing the Significance of the Correlation Coefficient - Example The computed t (3.297) is within the rejection region, therefore, we will reject H0. This means the correlation in the population is not zero. From a practical standpoint, it indicates to the sales manager that there is correlation with respect to the number of sales calls made and the number of copiers sold in the population of salespeople. 15 Correlation(Contd.) Spearman’s rank of coefficient of correlation is suitable for Qualitative data like honesty, efficiency, intelligence etc. not applicable for large and grouped data set. Here, d is the difference between two ranks. If tie in Ranks: Standard error of coefficient of correlation: Rank correlation • Compute rank correlation for the following data: • X = 15 20 28 12 40 60 20 80 • Y = 40 30 50 30 20 10 30 60