Uploaded by nagadharshini2001

CORRELATION

advertisement
Correlation
A measure of the strength of association among the
between variables
Correlation co-efficient or
Correlation index (r)
Expresses the strength of association between variables/
Degree of Association
Regression (r2)
Predicts the value for one variable given a value for
another variable.
CORRELATION (Association)
 Co – Together ; Relation – Connection
 Meaning of correlation:
Correlation is defined as Relationship between two or more variables.
 Definition:
“The degree of association between two variables”
“A measure of the strength of association among and between variables”
 Example:
1) Income and standard of living of a person
2) Monsoon and agricultural production at a particular season
3) Relationship between price and demand
 Uses of correlation:
Before going to deal with the various methods of correlation, it is necessary to know
the various uses of correlation is statistical analysis which can be cited as follows:
1) It is used in deriving precisely the degree, and direction of relationship between
variables like price and demand, advertising expenditure and sales, rainfalls and
crops yield etc.
2) It is used in developing the concept of regression, and ratio of variation which help
in estimating the values of one variable for a given value of another variable.
3) It is used in reducing the range of uncertainty in the matter of prediction.
4) It is used in presenting the average relationship between any two variables through
a single value of co-efficient of correlation.
5) In the field of economics it is used in understanding the economic behaviour, and
locating the important variables on which the others depend.
6) In the field of business it is used advantageously to estimate the cost of sales,
volume of sales, sales price, and any other values on the basis of some other
variables which are financially related to each other.
7) In the field of science and philosophy, also, the methods of correlation are
profusely used in making progressive developments in the respective lines.
8) In the field of nature also, it is used in observing the multiplicity of the inter-related
forces.
 Types of Correlation:
1. In terms of direction of variables
Scatter plots are constructed by plotting two variables along the horizontal (x)
and vertical (y) axes.
Note that the more closely the cluster of dots represents a straight line, the
stronger the correlation.
POSITIVE CORRELATION
NEGATIVE CORRELATION
Meaning:
One of the random variables
The two random variables
increases (decrease) together. increases as the other
There is a positive correlation. decreases.
Example:
There is a positive correlation
between height and weight:
weight increases as height
increases.
There is a negative
correlation between speed
and the amount of time it
takes to get somewhere: as
speed increases, it takes a
shorter amount of time to get
to a destination.
NO CORRELATION
There is no linear
relationship between
the two random
variables.
There is no
correlation between
being able to write in
cursive and the
number of fish in the
ocean.
2. In terms of no of variables
One dependent variable and one independent variable.
One dependent variable and more than one independent variables but only one independent variable
is considered and other independent variables are considered constant.
One dependent variable and more than one independent variable.
SIMPLE CORRELATION
PARTIAL CORRELATION
MULTIPLE CORRELATION
Meaning:
Correlation is said to simple if When three or more
only two variables are
variables are considered
analysed.
for analysis but onnly two
In case of multiple
correlation three or
more variables are
Example:
Correlation is said to be
simple when it is done
between demand and supply
or we can say income and
expenditure
infuencing variables are
studied and rest
influencing variables are
kept constant.
studied simultaneously.
Correlation analysis is
done with demand, supply
and income. Where
income is kept constant
Rainfall, production of
rice and price of rice are
studied simultaneously
will be known are
multiple correlation.
3. In terms of shape
Distinction between linear and non – linear correlation is based upon the constancy of
the ratio of change between the variables.
Linear Correlation
Meaning:
If the amount of change in one
variable tends to bear constant ratio
to the amount of change in the other
variable then the Correlation is said
to be linear.
In other words, when all the points
on the scatter diagram tend to lie
near a line which looks like a
straight line, the correlation is said
to be linear.
Example:
When the amount of output in a
factory is doubled by doubling the
number of workers, this is an
example of linear correlation.
Non – Linear Correlation
Correlation is said to be non linear if the ratio of
change is not constant. In other words, when all the
points on the scatter diagram tend to lie near a smooth
curve, the correlation is said to be non linear
(curvilinear).
 Representation of Correlation:
Correlation between two random variables is typically presented graphically using a
scatter plot, or numerically using a correlation coefficient.
SCATTER DIAGRAM
Scatter diagram shows the STRENGTH (Strong or weak) of the two or more variables
graphically, It helps to identify the direction of the association between two variables under
study but it fails to tell us about the intensity of the correlation or association between two
variables, that can be calculated by correlation coefficient.gives direction and intensity
CORRELATION COEFFICIENT
The index of the degree of relationship between two continuous variables is known as
correlation coefficient (r).
 It was developed by Karl Pearson
 It is also called Pearson’s coefficient
 Product moment correlation
Assumptions of correlation co-efficient:
 Variables under study are continuous random variable and they are normally
distribute
 The relationship between variable is linear
 Each pair of observation is unconnected with other pair
Properties of correlation co-efficient:
 This is unit free measure and is denoted by r.
 Correlation co-efficient is not affected by origin or scale or both
 It ranges from -1 to +1
THE ABSOLUTE VALUE OF THE CORRELATION COEFFICIENT GIVES US THE
STRENGTH.
THE LARGER THE NUMBER, THE STRONGER THE RELATIONSHIP .
FOR EXAMPLE, |-.75| = .75, WHICH HAS A STRONGER RELATIONSHIP THAN .65.
RELATIONSHIP
+
Value of r and its interpretation *
Perfect negative correlation
Strong negative correlation
Moderate negative correlation
Weak negative correlation
No correlation
Weak positive correlation
Moderate positive correlation
Strong positive correlation
Perfect positive correlation
TYPES OF CORRELATION
K ARL P EARSON
S PEARMAN ’ S R ANK
P OINT B ISERIAL
P HI C ORRELATION
C HI S QUARE
P ARAMETRIC
N ON -P ARAMETRIC
COEFFICIENT
V ARIABLE 1
INTERVAL /R ATIO
O RDINAL
D ICHOTOMOUS
D ICHOTOMOUS
N OMINAL
VARIABLE
INTERVAL /R ATIO
O RDINAL
INTERVAL /R ATIO
D ICHOTOMOUS
N OMINAL
2
Measurement Level
Qualitative
Nominal
Quantitative
Ordinal
Ratio
Interval
Nominal – Qualitative/Categorical eg. Sex, color
1. Nominal variable is the most basic level of measurement.
2. It is also known categorical or qualitative
3. Example: sex, colour
4. Nominal variables can be stored as a word or text or given a numerical code.
5. To summarise the nominal data we use frequency or percentage. But we cannot find the mean of it.
6. Graphically represent as pie chart or bar diagram
Ordinal:
1. Examples: Rank, satisfaction and Fanciness, likelihood
2. The gap between one value and another value differ. That is gap between unsatisfied and very
unsatisfied may be small and the gap between unsatisfied and satisfied may be large.
3. Graphically represent as bar diagram must not use pie chart.
Interval/Ratio:
1. The most precise level of measurement is interval/ratio.
2. No of persons, weight, age and size
3. Interval/Ratio data is also known as scale, quantitative or parametric.
4. It may be discrete or continuous
5. Graphically bar chart
COVARIANCE:
In probability theory and statistics, covariance is a measure of the joint variabi lity of two
random variables.
Covariance is measured in units. Those units are computed by multiplying the units of two
variables.

Positive covariance: Indicates that two variables tend to move in the same direction.

Negative covariance: Reveals that two variables tend to move in inverse directions.
The sign of the covariance therefore shows the tendency in the linear relationship between the
variables. The magnitude of the covariance is not easy to interpret because it is not normalized
and hence depends on the magnitudes of the variables.
The normalized version of the covariance, the correlation coefficient, however, shows by its
magnitude the strength of the linear relation.
Formula for covariance:
The covariance between two random variables X and Y can be calculated using
the following formula (for population):
̅ )(
̅)
∑(
(
)
The covariance of two random variables, which is a population parameter that can be seen as a
property of the joint probability distribution
For a sample covariance, the formula is slightly adjusted
̅ )(
∑(
(
)
̅)
The sample covariance, which in addition to serving as a descriptor of the sample, also serves
as an estimated value of the population parameter.
"Covariance” indicates the
direction of the linear
relationship between variables.
"Correlation” on the other hand
measures both the strength and
direction of the linear
relationship between two
variables
Correlation is a function of
the covariance
Correlation is the scaled measure of covariance. It is dimensionless. In other words, the
correlation coefficient is always a pure value and not measured in any units.
KARL PEARSON:
 It is a quantitative method of calculating correlation.
 The Pearson correlation coefficient (named for Karl Pearson) can be used to
summarize the strength of the linear relationship between two data samples.
 The Pearson's correlation coefficient is calculated as the covariance of the two
variables divided by the product of the standard deviation of each data
sample.
(
(
)
)
Where:

ρ(X,Y) – the correlation between the variables X and Y

Cov(X,Y) – the covariance between the variables X and Y

σX – the standard deviation of the X-variable

σY – the standard deviation of the Y-variable
(
√
(
̅
∑
(
√
(
)
(
̅ )(
∑(
)
)
)
)
(∑(
((√∑
(∑(
((√∑
̅)
̅
∑
∑(
(
̅ )(
∑
(√
̅ ∑
√
̅ )(
̅)
̅ ))
̅ )(√∑
̅ )(
̅ )(√∑
̅ ))
̅ ))
̅ ))
)
̅
)
Merits of Karl Pearson correlation Co-efficient:




This method not only indicates the presence or absence of correlation between any two
variables but also, determines the exact extent, or degree to which they are correlated.
Under this method, we can also ascertain the direction of the correlation i.e. whether the
correlation between the two variables is positive, or negative.
This method enables us in estimating the value of a dependent variable with reference to a
particular value of an independent variable through regression equations.
This method has a lot of algebraic properties for which the calculation of co-efficient of
correlation, and a host of other related factors viz. co-efficient of determination, are made easy.
Demerits of Karl Pearson Correlation Co-efficient:






It is comparatively difficult to calculate as its computation involves intricate algebraic methods
of calculations.
It is very much affected by the values of the extreme items.
It is based on a large number of assumptions viz. linear relationship, cause and effect
relationship etc. which may not always hold well.
It is very much likely to be misinterpreted particularly in case of homogeneous data.
In comparison to the other methods, it takes much time to arrive at the results.
It is subject to probable error which it’s propounded himself admits, and therefore, it is always
advisable to compute it probable error while interpreting its results.
SPEARMAN’S RANK CORRELATION:
This method is a development over Karl Pearson’s method of correlation on the point
that
(i)
It does into need the quantitative expression of the data
(ii) It does not assume that the population under study is normally distributed.
This method was introduced by the British Psychologist Charles Edward Spearman in
1904. Under this method, correlation is measured on the basis of the ranks rather than
the original values of the variables. For this, the values of the two variables are first
converted into ranks in a particular order.
POINT – BISERIAL CORRELATION CO-EFFICIENT:
The point-biserial correlation is mathematically equivalent to the Pearson (product
moment) correlation; that is, if we have one continuously measured variable X and a
dichotomous variable Y,
phi coefficient is a symmetrical statistic, which means the independent variable and
dependent variables are interchangeable.
Download