Correlation A measure of the strength of association among the between variables Correlation co-efficient or Correlation index (r) Expresses the strength of association between variables/ Degree of Association Regression (r2) Predicts the value for one variable given a value for another variable. CORRELATION (Association) Co – Together ; Relation – Connection Meaning of correlation: Correlation is defined as Relationship between two or more variables. Definition: “The degree of association between two variables” “A measure of the strength of association among and between variables” Example: 1) Income and standard of living of a person 2) Monsoon and agricultural production at a particular season 3) Relationship between price and demand Uses of correlation: Before going to deal with the various methods of correlation, it is necessary to know the various uses of correlation is statistical analysis which can be cited as follows: 1) It is used in deriving precisely the degree, and direction of relationship between variables like price and demand, advertising expenditure and sales, rainfalls and crops yield etc. 2) It is used in developing the concept of regression, and ratio of variation which help in estimating the values of one variable for a given value of another variable. 3) It is used in reducing the range of uncertainty in the matter of prediction. 4) It is used in presenting the average relationship between any two variables through a single value of co-efficient of correlation. 5) In the field of economics it is used in understanding the economic behaviour, and locating the important variables on which the others depend. 6) In the field of business it is used advantageously to estimate the cost of sales, volume of sales, sales price, and any other values on the basis of some other variables which are financially related to each other. 7) In the field of science and philosophy, also, the methods of correlation are profusely used in making progressive developments in the respective lines. 8) In the field of nature also, it is used in observing the multiplicity of the inter-related forces. Types of Correlation: 1. In terms of direction of variables Scatter plots are constructed by plotting two variables along the horizontal (x) and vertical (y) axes. Note that the more closely the cluster of dots represents a straight line, the stronger the correlation. POSITIVE CORRELATION NEGATIVE CORRELATION Meaning: One of the random variables The two random variables increases (decrease) together. increases as the other There is a positive correlation. decreases. Example: There is a positive correlation between height and weight: weight increases as height increases. There is a negative correlation between speed and the amount of time it takes to get somewhere: as speed increases, it takes a shorter amount of time to get to a destination. NO CORRELATION There is no linear relationship between the two random variables. There is no correlation between being able to write in cursive and the number of fish in the ocean. 2. In terms of no of variables One dependent variable and one independent variable. One dependent variable and more than one independent variables but only one independent variable is considered and other independent variables are considered constant. One dependent variable and more than one independent variable. SIMPLE CORRELATION PARTIAL CORRELATION MULTIPLE CORRELATION Meaning: Correlation is said to simple if When three or more only two variables are variables are considered analysed. for analysis but onnly two In case of multiple correlation three or more variables are Example: Correlation is said to be simple when it is done between demand and supply or we can say income and expenditure infuencing variables are studied and rest influencing variables are kept constant. studied simultaneously. Correlation analysis is done with demand, supply and income. Where income is kept constant Rainfall, production of rice and price of rice are studied simultaneously will be known are multiple correlation. 3. In terms of shape Distinction between linear and non – linear correlation is based upon the constancy of the ratio of change between the variables. Linear Correlation Meaning: If the amount of change in one variable tends to bear constant ratio to the amount of change in the other variable then the Correlation is said to be linear. In other words, when all the points on the scatter diagram tend to lie near a line which looks like a straight line, the correlation is said to be linear. Example: When the amount of output in a factory is doubled by doubling the number of workers, this is an example of linear correlation. Non – Linear Correlation Correlation is said to be non linear if the ratio of change is not constant. In other words, when all the points on the scatter diagram tend to lie near a smooth curve, the correlation is said to be non linear (curvilinear). Representation of Correlation: Correlation between two random variables is typically presented graphically using a scatter plot, or numerically using a correlation coefficient. SCATTER DIAGRAM Scatter diagram shows the STRENGTH (Strong or weak) of the two or more variables graphically, It helps to identify the direction of the association between two variables under study but it fails to tell us about the intensity of the correlation or association between two variables, that can be calculated by correlation coefficient.gives direction and intensity CORRELATION COEFFICIENT The index of the degree of relationship between two continuous variables is known as correlation coefficient (r). It was developed by Karl Pearson It is also called Pearson’s coefficient Product moment correlation Assumptions of correlation co-efficient: Variables under study are continuous random variable and they are normally distribute The relationship between variable is linear Each pair of observation is unconnected with other pair Properties of correlation co-efficient: This is unit free measure and is denoted by r. Correlation co-efficient is not affected by origin or scale or both It ranges from -1 to +1 THE ABSOLUTE VALUE OF THE CORRELATION COEFFICIENT GIVES US THE STRENGTH. THE LARGER THE NUMBER, THE STRONGER THE RELATIONSHIP . FOR EXAMPLE, |-.75| = .75, WHICH HAS A STRONGER RELATIONSHIP THAN .65. RELATIONSHIP + Value of r and its interpretation * Perfect negative correlation Strong negative correlation Moderate negative correlation Weak negative correlation No correlation Weak positive correlation Moderate positive correlation Strong positive correlation Perfect positive correlation TYPES OF CORRELATION K ARL P EARSON S PEARMAN ’ S R ANK P OINT B ISERIAL P HI C ORRELATION C HI S QUARE P ARAMETRIC N ON -P ARAMETRIC COEFFICIENT V ARIABLE 1 INTERVAL /R ATIO O RDINAL D ICHOTOMOUS D ICHOTOMOUS N OMINAL VARIABLE INTERVAL /R ATIO O RDINAL INTERVAL /R ATIO D ICHOTOMOUS N OMINAL 2 Measurement Level Qualitative Nominal Quantitative Ordinal Ratio Interval Nominal – Qualitative/Categorical eg. Sex, color 1. Nominal variable is the most basic level of measurement. 2. It is also known categorical or qualitative 3. Example: sex, colour 4. Nominal variables can be stored as a word or text or given a numerical code. 5. To summarise the nominal data we use frequency or percentage. But we cannot find the mean of it. 6. Graphically represent as pie chart or bar diagram Ordinal: 1. Examples: Rank, satisfaction and Fanciness, likelihood 2. The gap between one value and another value differ. That is gap between unsatisfied and very unsatisfied may be small and the gap between unsatisfied and satisfied may be large. 3. Graphically represent as bar diagram must not use pie chart. Interval/Ratio: 1. The most precise level of measurement is interval/ratio. 2. No of persons, weight, age and size 3. Interval/Ratio data is also known as scale, quantitative or parametric. 4. It may be discrete or continuous 5. Graphically bar chart COVARIANCE: In probability theory and statistics, covariance is a measure of the joint variabi lity of two random variables. Covariance is measured in units. Those units are computed by multiplying the units of two variables. Positive covariance: Indicates that two variables tend to move in the same direction. Negative covariance: Reveals that two variables tend to move in inverse directions. The sign of the covariance therefore shows the tendency in the linear relationship between the variables. The magnitude of the covariance is not easy to interpret because it is not normalized and hence depends on the magnitudes of the variables. The normalized version of the covariance, the correlation coefficient, however, shows by its magnitude the strength of the linear relation. Formula for covariance: The covariance between two random variables X and Y can be calculated using the following formula (for population): ̅ )( ̅) ∑( ( ) The covariance of two random variables, which is a population parameter that can be seen as a property of the joint probability distribution For a sample covariance, the formula is slightly adjusted ̅ )( ∑( ( ) ̅) The sample covariance, which in addition to serving as a descriptor of the sample, also serves as an estimated value of the population parameter. "Covariance” indicates the direction of the linear relationship between variables. "Correlation” on the other hand measures both the strength and direction of the linear relationship between two variables Correlation is a function of the covariance Correlation is the scaled measure of covariance. It is dimensionless. In other words, the correlation coefficient is always a pure value and not measured in any units. KARL PEARSON: It is a quantitative method of calculating correlation. The Pearson correlation coefficient (named for Karl Pearson) can be used to summarize the strength of the linear relationship between two data samples. The Pearson's correlation coefficient is calculated as the covariance of the two variables divided by the product of the standard deviation of each data sample. ( ( ) ) Where: ρ(X,Y) – the correlation between the variables X and Y Cov(X,Y) – the covariance between the variables X and Y σX – the standard deviation of the X-variable σY – the standard deviation of the Y-variable ( √ ( ̅ ∑ ( √ ( ) ( ̅ )( ∑( ) ) ) ) (∑( ((√∑ (∑( ((√∑ ̅) ̅ ∑ ∑( ( ̅ )( ∑ (√ ̅ ∑ √ ̅ )( ̅) ̅ )) ̅ )(√∑ ̅ )( ̅ )(√∑ ̅ )) ̅ )) ̅ )) ) ̅ ) Merits of Karl Pearson correlation Co-efficient: This method not only indicates the presence or absence of correlation between any two variables but also, determines the exact extent, or degree to which they are correlated. Under this method, we can also ascertain the direction of the correlation i.e. whether the correlation between the two variables is positive, or negative. This method enables us in estimating the value of a dependent variable with reference to a particular value of an independent variable through regression equations. This method has a lot of algebraic properties for which the calculation of co-efficient of correlation, and a host of other related factors viz. co-efficient of determination, are made easy. Demerits of Karl Pearson Correlation Co-efficient: It is comparatively difficult to calculate as its computation involves intricate algebraic methods of calculations. It is very much affected by the values of the extreme items. It is based on a large number of assumptions viz. linear relationship, cause and effect relationship etc. which may not always hold well. It is very much likely to be misinterpreted particularly in case of homogeneous data. In comparison to the other methods, it takes much time to arrive at the results. It is subject to probable error which it’s propounded himself admits, and therefore, it is always advisable to compute it probable error while interpreting its results. SPEARMAN’S RANK CORRELATION: This method is a development over Karl Pearson’s method of correlation on the point that (i) It does into need the quantitative expression of the data (ii) It does not assume that the population under study is normally distributed. This method was introduced by the British Psychologist Charles Edward Spearman in 1904. Under this method, correlation is measured on the basis of the ranks rather than the original values of the variables. For this, the values of the two variables are first converted into ranks in a particular order. POINT – BISERIAL CORRELATION CO-EFFICIENT: The point-biserial correlation is mathematically equivalent to the Pearson (product moment) correlation; that is, if we have one continuously measured variable X and a dichotomous variable Y, phi coefficient is a symmetrical statistic, which means the independent variable and dependent variables are interchangeable.