Variance Presentation

advertisement
Recap
All about measures of location
measures of centre
Mean
Median
Mode
You should be able to calculate
these from grouped and raw data
measures of Any Position
Percentiles
You should also be able to draw a
box and whisker plot
MH-Variance -Kuwait
This week
Measures of Spread
Sample of Heights of
peoples in Coventry and
Norwich
We need more then the mean to compare data sets
We need a numerical measure representing how the data varies
MH-Variance -Kuwait
Measures of Spread
Range
Inter Quartile Range
This hour lesson we concentrate on how to
calculate the following two measures
Variance
Standard Deviation
MH-Variance -Kuwait
Range
Range = largest value - smallest value
Range = 615 - 425 = 190
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
MH-Variance -Kuwait
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
Interquartile Range
 The interquartile range of a data set is the difference
between the third quartile and the first quartile.
 It is the range for the middle 50% of the data.
 It overcomes the sensitivity to extreme data values.
375 400 425 450 475 500 525 550 575 600 625
Interquartile Range
L25= (n+1)*25/100
71/4 = 17.75
L75= (n+1)*75/100
71*3/4 = 53.25
18th value
53th value
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
MH-Variance -Kuwait
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
Basic Notation
As we will be working with formulas we need to make sure about some notation
Data set “X”
10, 30,
x1
x2
301 ,
46,
18, 21,
19,
83, 4, .............., 88
x3
x4
x5
x5
x6
x6
x7
xn
We often refer to a data set with an upper case letter like X,
In which case the numbers in the data set are called elements (x1, x2, ..., xn)
“n” or “N” is the number of elements or observations
n
x
1
i
 x 1  x 2  x 3  .......... .......... ...  x n
MH-Variance -Kuwait
X
Net deviations from the mean
will always sum to zero
x1
x2
x x3
x4
n
 (x  x)  0
i 1
i
So “total distance” from the mean is zero
Because +ve and –ve contributions cancel
MH-Variance -Kuwait
Measures of data Spread
• But we want a measure that will represent these net
deviations somehow.
• One way to ensure a non-zero result is to square each
deviation before adding it.
• We can then average these deviations by dividing by their
Variance
number “n” and use this compare data sets Units squared
• OR, we can average and take the square root of the above
Standard deviation
Units of Units
• This latter approach will have the same units as the
underlying data.
MH-Variance -Kuwait
Calculate the Variance for the following data set
This data relates to Measures of distance travelled to work in units of (miles)
xi  x
(xi  x) 2
10
-0.9
0.81
3.5
-7.4
54.76
27
16.1
259.21
12
1.1
1.21
2
-8.9
79.21
xi
395.2
σ 
2
2
(x

x
)
 i
N
395.2

 79.04
5
This is the population
variance (miles2)
σ
2
(x

x
)
 i
N
395.2

 8.89
5
This is the population
standard deviation (miles)
Mean is 10.9
n=5
Units in miles
MH-Variance -Kuwait
Population Variance for Grouped Data
Mi is calls midpoint our Xi
Rent (€)
420-439
440-459
460-479
480-499
500-519
520-539
540-559
560-579
580-599
600-619
Total
208234.29
 
70
2
fi
8
17
12
8
7
4
2
4
2
6
70
Mi
429.5
449.5
469.5
489.5
509.5
529.5
549.5
569.5
589.5
609.5
Mi - x
-63.7
-43.7
-23.7
-3.7
16.3
36.3
56.3
76.3
96.3
116.3
(M i - x )2 f i (M i - x )2
4058.96 32471.71
1910.56 32479.59
562.16
6745.97
13.76
110.11
265.36
1857.55
1316.96
5267.86
3168.56
6337.13
5820.16 23280.66
9271.76 18543.53
13523.36 81140.18
208234.29
208234.29
208234.29
2
s


69
70
MH-Variance -Kuwait
s
208234.29
69
Variance for Grouped Data

For sample data
2
f
(
M

x
)

i
i
s2 
n 1

For population data
2
f
(
M


)

i
i
2 
N
Sample variance
s2 is commonly referred to by σ2n-1
Sample Standard Deviation
s
is commonly referred to by
σn-1
So why is the sample measure divided by (n-1) ? – will deal with this soon!
MH-Variance -Kuwait
Formulae
RAW DATA
Sample Variance
 (x
s 
2
i
 x)
2
s2 
n -1
RAW DATA
Population Variance

2
(x  )


2
2
i
N
GROUPED DATA
Sample Variance
n-1
x


2
i
 n(  2 )
N
2
2
x
.f

n(
x
)

i
2
i
s 
n -1
 (xi  x) . fi
s 
n -1
2
2
GROUPED DATA
Population Variance
 ( xi   ) . f i
 
N
2
2
)
x
n(

x
 i
2
2
x
.f

n(

)

i
i
2 
N
2
2
MH-Variance -Kuwait
Things will now do
1- Understand why the following two formulas are the same and appreciate
that the second form is much quicker to calculate than the first form
s2 
2
(x

x
)
 i
n -1
s
2
x


2
i
 n(x 2 )
n-1
2- I would like you think of calculating variance as
s2 
S xx or

n-1
2

Where Sxx can be calculated in different ways  x  x 2   x 2  nx 2
and can be divided appropriately dependent on whether we have a sample or
population
3- We should investigate why we average , S2 , by (n-1) when we are dealing
with a sample
We will deal with this third and unusual point next!!
MH-Variance -Kuwait
S xx
n
Why we divide by (n-1)
Population
v
Sample
v

2
v
MH-Variance -Kuwait
v
We take a random sample from the
population and use it to estimate σ2
We are trying to estimate the
true population mean σ2
Population
In the real world we take a
sample and use it
Sample
s

2
s
2
I am going to show you that S2
will be the better estimator of
the true population variance, σ2
MH-Variance -Kuwait
2
Taking Lots of Samples of fixed size n &
Build distributions of S2 and σ2
2
2
2
32
1
2
s
1
s4
2
s5
2
5
2
n
s 
2
2
2
2
4
s
s
s
i 1
n
i
3
sn
n
2
n
2

2
MH-Variance -Kuwait
2
s 
2

i 1
n
2
2
i
 2
Calculating s2 and σ2 of many
samples , grouping and counting we
can build distributions for
s2 and σ2
σs2
S2 dist’n
dist’n
<σ2
σ2
RED distribution is centered around the real population variance
MH-Variance -Kuwait
Showing
s 2 = σ2
AVG(S2) AVG(σs2)
Row 1 Sample 1
S2
σs2
Row 2- Sample 2
S2
σs2
Row 3 Sample 3
S2
σs2
Row 4 Sample 4
Row 100 Sample 100
S2
σ S2
S2 σs2-Kuwait
MH-Variance
I will generate a
Population
of numbers
And calculate
the Pop Var (σ2)
Then show that
AVG(S2) = σ2
AVG(σs2) < σ2
Therefore E(S2)= σ2
Summary
We have looked at the formula for calculating Variance and
Its square root Std- Deviation
We have noted that we average by n or n-1 depending on
whether or not we are working with a sample or population
We have noted that that we can write Sxx = x  x 2 in
different ways that are faster to calculate.
We should work these different ways through shortly
But first
Some questions
MH-Variance -Kuwait
Download