Variability of Bivariate Data

advertisement
Correlation
• This Chapter is on Correlation
• We will look at patterns in data on a scatter
graph
• We will be looking at how to calculate the
variance and co-variance of variables
• We will see how to numerically measure the
strength of correlation between two variables
Correlation
Scatter Graphs
Scatter Graphs are a way of representing 2
sets of data. It is then possible to see whether
they are related.
Positive Correlation
 As one variable increases, so does the other
Positive
Negative Correlation
 As one variable increases, the other
decreases
Negative
No Correlation
 There seems to be no pattern linking the
two variables
None
6A
Scatter Graphs
In the study of a city, the population
density, in people/hectare, and the
distance from the city centre, in km, was
investigated by choosing sample areas. The
results are as follows:
Area
A
B
C
D
E
Distance
0.6
3.8
2.4
3.0
2.0
Pop.
Density
50
22
14
20
33
Area
F
G
H
I
J
Distance
1.5
1.8
3.4
4.0
0.9
Pop.
Density
47
25
8
16
38
Plot a scatter graph and describe the
correlation. Interpret what the correlation
means.
Pop. Density (people/hectare)
Correlation
50
40
30
20
10
0
0
1
2
3
4
Distance from centre (km)
The correlation is negative, which means
that as we get further from the city
centre, the population density
decreases.
Correlation
Variability of Bivariate Data
We learnt in chapter 3 that:
(Although remember
that this formula
changed to make it
easier to use)
2
(
x

x
)

Variance 
n
In Correlation:
 ( x  x)2
Sxx
‘How x varies’
Similarly for y:
 ( y  y )2
Syy
‘How y varies’
And you can also calculate the Co-variance of both variables
 ( x  x)( y  y)
n
 ( x  x)( y  y)
Sxy
‘How x and
y vary
together’
6B/C
Correlation
Variability of Bivariate Data
Like in chapter 3, we can use a formula which will make calculations easier
2
(
x

x
)

Variance 
n
BUT:
 ( x  x)2
Sxx
Variance  Sxx
n
6B/C
Correlation
Variability of Bivariate Data
Variance  Sxx
n
Sxx  n Variance
 x 2  x 2 
Sxx  n        
 n  
 n
 x2  x  2 
Sxx  n      2 
n
( n) 



Sxx   x 2
x



n
2
Multiply both sides by ‘n’
The easier formula for
variance from chapter 3
2 
 x2
n
x
 

n


2
For the second fraction,
square the top and bottom
separately
Multiplying both fractions by
‘n’ will cancel a ‘divide by n’
from each of them
6B/C
Correlation
Variability of Bivariate Data
These are the formulae for Sxx, Syy and Sxy. You are given these in the formula
booklet. You do not need to know how to derive them (like we just did!)
Sxx   x
2
x



n
2
Syy   y
2
y



2
n
x y

Sxy   xy 
n
6B/C
Sxx   x
2
Syy   y
 x

2
n
2
 y

2
Correlation
x
y
Sxy   xy   
n
n
Variability of Bivariate Data
Calculate Sxx, Syy and Sxy,
based on the following
information.
n  12
 x  155
2
x
  2031
 y  198
2
y
  3904
 xy  2732
Sxx   x 2 
 x
2
n
Syy   y 2 
 y
2
n
(155) 2
Sxx  2031 
12
(198) 2
Syy  3904 
12
Sxx  28.92
Syy  637
x y

Sxy   xy 
n
Sxy  2732 
155  198
12
Sxy  174.5
6B/C
Sxx   x
2
Syy   y
 x

2
n
2
 y

2
Correlation
n
n
Variability of Bivariate Data
The following table shows babies
heads’ circumferences (cm) and the
gestation period (weeks) for 6 new
born babies. Calculate Sxx, Syy and
Sxy.
We need
n
x
y
Sxy   xy   
6
 x  190
2
x
  6036
 y  229
2
y
  8753
Baby
A
B
C
D
E
F
Head size
(x)
31
33
30
31
35
30
Gestation
period (y)
36
37
38
38
40
40
x2
961
1089
900
961
1225
900
y2
1296
1369
1444
1444
1600
1600
xy
1116
1221
1140
1178
1400
1200
 xy  7255
6B/C
Sxx   x
2
Syy   y
 x

2
Correlation
n
2
 y

2
n
n
Variability of Bivariate Data
The following table shows babies
heads’ circumferences (cm) and the
gestation period (weeks) for 6 new
born babies. Calculate Sxx, Syy and
Sxy.
We need
n
x
y
Sxy   xy   
6
 x  190  x
 y  229  y
 xy  7255
2
2
Sxx   x 2 
 x
2
n
Syy   y 2 
 y
n
(190) 2
Sxx  6036 
6
(229) 2
Syy  8753 
6
Sxx  19.33
Syy  12.83
 6036
x y

Sxy   xy 
 8753
Sxy  7255 
2
n
190  229
6
Sxy  3.33
6B/C
Correlation
Product Moment Correlation Coefficient
We can test the correlation of data by calculating the Product Moment
Correlation Coefficient. This uses Sxx, Syy and Sxy.
Sxy
r
SxxSyy
The value of this number tells you what the correlation is and how strong it is.
Negative
Correlation
No Linear
Correlation
Positive
Correlation
-1
0
1
The closer to 1, the stronger the positive correlation. The same applies for -1
and negative correlation. A value close to 0 implies no linear correlation.
6B/C
Correlation
Product Moment Correlation Coefficient
Given the following data, calculate the Product Moment Correlation Coefficient.
Sxx  74
r
Sxy
SxxSyy
r
102
74 150
r  0.97
Syy  150
Sxy  102
There is positive
correlation, as x
increases, y does as well.
6B/C
Correlation
Limitations of the Product Moment Correlation Coefficient
Sometimes it may indicate Correlation between unrelated variables
 Cars on a particular street have increased, as have the sales of DVDs
in town
 The PMCC would indicate positive correlation where the two are most
likely not linked
 The speed of computers has increased, as has life expectancy
amongst people
 These are not directly linked, but are both due to scientific
developments
6B/C
Correlation
Using Coding with the PMCC
Calculating the PMCC from this table.
n 6
x
102
103
102
103
104
103
y
320
335
345
355
360
380
x2
10404
10609
10404
10609
10816
10609
y2
102400
112225
119025
126025
129600
144400
xy
32640
34505
35190
36565
37440
39140
Sxx   x
2

 x
n
2
Syy   y 
2
 y
n
2
 x  617
 y  2095
 x  63451
 y  733675
 xy  215480
2
2
x y

Sxy   xy 
n
(617) 2
Sxx  63451 
6
(2095) 2
Syy  733675 
6
Sxy  215480 
Sxx  2.83
Syy  2170.83
Sxy  44.17
617  2095
6
6D
Correlation
Using Coding with the PMCC
Calculating the PMCC from this table.
x
102
103
102
103
104
103
y
320
335
345
355
360
380
x2
10404
10609
10404
10609
10816
10609
y2
102400
112225
119025
126025
129600
144400
xy
32640
34505
35190
36565
37440
39140
Sxx  2.83
r
Sxy
SxxSyy
r
44.17
2.83  2170.83
Syy  2170.83
Sxy  44.17
r  0.563
6D
Correlation
p  x  100
q
Using Coding with the PMCC
Calculating the PMCC from this table, using coding.
x
102
103
102
103
104
103
y
320
335
345
355
360
380
p
2
3
2
3
4
3
q
4
7
9
11
12
16
p2
4
9
4
9
16
9
q2
16
49
81
121
144
256
pq
8
21
18
33
48
48
  p  Sqq  q

2
Spp   p 2 
2
n

 q
n
2
y  300
5
n 6
 p  17
 q  59
 p  51
 q  667
 pq  176
2
2
p
q
Spq   pq   
n
17  59
6
(17) 2
Spp  51 
6
(59) 2
Sqq  667 
6
Spq  176 
Spp  2.83
Sqq  86.83
Spq  8.83
6D
Correlation
Using Coding with the PMCC
Calculating the PMCC from this table.
x
102
103
102
103
104
103
y
320
335
345
355
360
380
p
2
3
2
3
4
3
q
4
7
9
11
12
16
p2
4
9
4
9
16
9
q2
16
49
81
121
144
256
pq
8
21
18
33
48
48
Spp  2.83
Sqq  86.83
Spq  8.83
Spq
SppSqq
8.83
r
2.83  86.83
r
r  0.563
So coding will not
affect the PMCC!
6D
Summary
• We have looked at plotting scatter graphs
• We have looked at calculating measures of variance,
Sxx, Syy and Sxy
• We have also seen types of correlation and how to
recognise them on a graph
• We have calculated the Product Moment Correlation
Coefficient, and interpreted it. It is a numerical
measure of correlation.
Download