covariance and correlation

advertisement
Correlation and Regression
1
Bivariate data
When
measurements
on
two
characteristics are to be studied
simultaneously because of their
interdependence, we get observations
in pairs.
Such a set of data in pairs is called
bivariate data.
2
COVARIANCE
While variance measures the variation among the
observations in a data set, COVARIANCE
measures the joint variation among the pairs of
observations in a bivariate data set.
i.e. Covariance measures the strength of linear
relationship between two or more variables.
But it cannot be used to compare the linear
relationship between these variables. Hence,
there is a necessity to study the concept of
correlation.
3
CORRELATION
Correlation analysis: When changes
in one variable also show changes in
the other variable, the two variables are
said to be correlated.
4
Correlation
Positive
Perfect
Zero
Imperfect
Strong
Weak
Negative
Perfect
Imperfect
Strong
Weak
5
Methods of assessing Correlation
SCATTER DIAGRAM
Scatter diagram is the graphical method of
assessing
correlation
between
two
variables.
6
PERFECT POSITIVE CORRELATION
Y
X
7
PERFECT NEGATIVE CORRELATION
Y
X
8
IMPERFECT POSITIVE CORRELATION
Y
X
9
IMPERFECT NEGATIVE CORRELATION
Y
X
10
NO CORRELATION
Y
X
11
• Correlation is measured with the help
of correlation coefficient r.
• Its value always lies between
-1 and +1
i.e. -1 ≤ r ≤ 1
12
Correlation
Positive Correlation
0 < r 1
No Correlation
r=0
Negative Correlation
-1 < r < 0
Perfect Positive Imperfect Positive Perfect Negative
Correlation
Correlation
Correaltion
r=1
Weak Positive
r tends to 0
0< r < 1
Strong Positive
r tends to 1
r = -1
Weak Negative
r tends to 0
Imperfect Negative
Correlation
-1 < r < 0
Strong Negative
r tends to -1
13
Karl Pearson’s Coefficient of correlation:
Karl Pearson defined coefficient of correlation
as a measure of intensity or degree of linear
relationship between two variables.
Let X and Y be the two variables with n pairs of
observations, then they are represented as:
(xi , yi)
i = 1, 2, …, n
14
Spurious Correlation:
When the value of correlation coefficient shows
high presence of significant relationship, but no
logical relationship exists between the two
variables, such a correlation is called Spurious
Correlation.
Ex. Number of students getting graduate degree
every year and number of auto accidents in the
city.
15
Coefficient of Determination
The square of the correlation coefficient r, expressed as
r2, is known as coefficient of determination. It indicates
the extent to which variation in one variable is
explained by the variation in other.
Ex: If the correlation coefficient between x and y is 0.9,
the coefficient of determination will be 0.81. It implies
that there is 81% of variation in y explained by the
variation in x and the remaining 19% is explained by
some other factors. This 1-r2 is referred to as coefficient
of nondetermination.
The square root of coefficient of nondetermination is
known as coefficient of alienation.
16
Rank Correlation
Some times the data on two variables cannot be
measured quantitatively. In such situations the
observations can be ranked. Karl Pearson’s
correlation coefficient is not an appropriate
measure for qualitative data. Hence Spearman has
defined a coefficient of correlation for qualitative
data called as Spearman’s Rank Correlation
coefficient.
E.g. ranks given by judges in a beauty contest.
17
Spearman’s Rank Correlation
Coefficient (R)
R  1
6 d
2
i
nn  1
2
where di = Xi – Yi
Xi : Rank assigned by Judge 1
Yi : Rank assigned by Judge 2
n : Number of pairs of observations
18
Case of Tied Ranks
A correction factor has to be added to Σdi2 for each tie


2

m m 1 

2
6  d i 

12


R  1
2
n n 1


where m: number of individuals having a tie
19
Download