Correlation
1
LECTURE
CORRELATION
Outlines
2
Correlation
Rank Correlation
Linear Regression Line
Relation Between Correlation & Regression Coefficients
Relation Between two variables X & Y
3
Y
X
(a) Linear
4
Y
X
(b) Linear
5
Y
X
(c) Curvilinear
6
Y
X
(d) Curvilinear
7
Y
X
(e) No Relationship
Correlation
8
Two variables are said to be correlated if they tend simultaneously vary
in some direction. (same direction\ or opposite direction)
If both variables tend to increase (or decrease) together, the correlation
is said to be Positive.
If one variable tend to increase and other variable tend to decreases,
the correlation is said to be Negative.
Correlation coefficient
9
The correlation coefficient is a quantitative measure of the
strength of the linear relationship between two variables.
nxy x y
r
[n(x2 ) (x)2 ][n( y2 ) ( y)2 ]
r = Sample correlation coefficient
n = Sample size
x = independent variable
y = dependent variable
Example of correlation coefficient
10
y
487
445
272
641
187
440
346
238
312
269
655
563
Sum 4855
x
3
5
2
8
2
6
7
1
4
2
9
6
55
yx
1,461
2,225
544
5,128
374
2,640
2,422
238
1,248
538
5,895
3,378
26091
y2
237,169
198,025
73,984
410,881
34,969
193,600
119,716
56,644
97,344
72,361
429,025
316,969
2240286
x2
9
25
4
64
4
36
49
1
16
4
81
36
4855
11
r
r
nxy xy
[n(x2 ) (x)2 ][n(y2 ) (y)2 ]
12(26,091) 55(4,855)
[12(329) (55) 2 ][12(2,240,687) (4,855) 2 ]
0.8325
Regression
12
Correlation describes the strength of a linear relationship between two variables
Regression analysis describes the relationship between two (or more) variables.
Examples:
1. Income and educational level
2. Demand for electricity and the weather
Regression tells us how to draw the straight line
Definition: The relationship between the expected
value of dependent variable Y and
independent variable X is Known as
Regression line of Y on X
Interpretation of Regression line
13
Y
Y = bX + a
b = Slope
Change
in Y
Change in X
a = Y-intercept
X
Interpretation of Regression coefficient
14
The interpretation of the regression coefficient b is that it gives
the average change in the dependent variable for a unit increase
in the independent variable.
The slope coefficient may be positive or negative, depending on the
relationship between the two variables.
ESTIMATED REGRESSION
15
y ab x
x y
xy
b
n
2
(
x
)
x2 n
a y bx
ŷ = Estimated, or predicted, y value
a = Unbiased estimate of the regression intercept
b = Unbiased estimate of the regression slope
x = Value of the independent variable
or
r
Sy
Sx
Example of regression line Y on X
16
y
487
445
272
641
187
440
346
238
312
269
655
563
Sum 4855
x
3
5
2
8
2
6
7
1
4
2
9
6
55
yx
1,461
2,225
544
5,128
374
2,640
2,422
238
1,248
538
5,895
3,378
26091
y2
237,169
198,025
73,984
410,881
34,969
193,600
119,716
56,644
97,344
72,361
429,025
316,969
2240286
x2
9
25
4
64
4
36
49
1
16
4
81
36
4855
17
b
x y
xy
n
2
(
x
)
2
x
n
55(4,855)
26,091
12
49.9101
2
(55)
329
12
a y b x 404.5833 49.9101(4.5833) 175.8288
yˆ 175.8288 49.9101( x)
The principle of least squares
18
The principle of least squares consist of determining the value of
unknown parameters that will minimize the sum of squares of error
2
ˆ
(
y
y
)
should be minimize
A residual (or Error) is the difference between the actual value of
the dependent variable and the value predicted by the regression line.
y yˆ
Ranking
19
An order arrangement of objects (or individuals) according to some
characteristics of interest is called ranking.
The correlation between two sets of ranking is known as rank correlation.
Let two ranking of n objects with respect to character A & B be
respectively; x1, x2, . . . , xn & y1, y2, . . .yn
we assume that no two or more objects are given the same
ranks(Ranking without tie).
2
Then ranked correlation rs is calculated by rs 1
6
di
n (n 2 1)
If there is tie among ranks of individuals suppose m numbers of ties then for
3
2
(
m
m
)
d
each tie add a quantity
for each tie in i
12
Example 1
20
In a study of the relationship between education level and income the
following data was obtained. Find the relationship between them.
Sample
Education (X)
Income(Y)
A
Preparatory.
25
B
Primary.
10
C
College
8
D
secondary
15
E
Illiterate
50
F
University.
60
Without tie
21
Rank
X
Rank
Y
di=x-y
di2
5
3
2
4
4
5
-1
1
2
6
-4
16
3
4
-1
1
6
2
4
16
1
1
0
0
∑ di2=38
rs 1
6 38
0 . 085
6 ( 35 )
Example 2
22
In a study of the relationship between level education and income the
following data was obtained. Find the relationship between them.
Sample
Education
(X)
Income
(Y)
A
Preparatory.
25
B
Primary.
10
C
University.
8
D
secondary
10
E
secondary
15
F
Illiterate
50
G
University.
60
With tie
23
There is tie among 2
observation i,e m=2
There are 3 tie
rs 1
Rank
X
Rank
Y
di=x-y
di2
5
3
2
4
6
5.5
0.5
0.25
1.5
7
-5.5
30.25
3.5
5.5
-2
4
3.5
4
-0.5
0.25
7
2
5
25
1.5
1
0.5
0.25
6 64 0 . 5 0 . 5 0 . 5
0 . 169
7 ( 48 )
∑ di2=64
Exercise
24
The following table gives the distribution of the total population and
those who are totally or partially blind among them. Find out if there is
any correlation between age and blindness.
Age
0—10
10—20
20—30
30—40
40—50
50—60
60—70
70—80
No. of
persons in
thousand
100
60
40
36
24
11
6
3
Blind
55
40
40
40
36
22
18
5
Exercise
25
For data given on {Slide 16} Fit the following (IF POSSIBLE) & decide which is
the best fitted
X=c+dY
Linear Curve
Y= a+bX+cX2
Parabolic Curve
Y=a+bX+cX2+dX3 Cubic Curve
Y=a ebX
Y=aXb
Y=1/a+bX
Exponential Curve
Power Curve
Hyperbolic Curve
A good line is one that minimizes the sum of squared differences between
the points and the line.