Recap All about measures of location measures of centre Mean Median Mode You should be able to calculate these from grouped and raw data measures of Any Position Percentiles You should also be able to draw a box and whisker plot MH-Variance -Kuwait This week Measures of Spread Sample of Heights of peoples in Coventry and Norwich We need more then the mean to compare data sets We need a numerical measure representing how the data varies MH-Variance -Kuwait Measures of Spread Range Inter Quartile Range This hour lesson we concentrate on how to calculate the following two measures Variance Standard Deviation MH-Variance -Kuwait Range Range = largest value - smallest value Range = 615 - 425 = 190 425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 MH-Variance -Kuwait 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615 Interquartile Range The interquartile range of a data set is the difference between the third quartile and the first quartile. It is the range for the middle 50% of the data. It overcomes the sensitivity to extreme data values. 375 400 425 450 475 500 525 550 575 600 625 Interquartile Range L25= (n+1)*25/100 71/4 = 17.75 L75= (n+1)*75/100 71*3/4 = 53.25 18th value 53th value 3rd Quartile (Q3) = 525 1st Quartile (Q1) = 445 Interquartile Range = Q3 - Q1 = 525 - 445 = 80 425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 MH-Variance -Kuwait 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615 Basic Notation As we will be working with formulas we need to make sure about some notation Data set “X” 10, 30, x1 x2 301 , 46, 18, 21, 19, 83, 4, .............., 88 x3 x4 x5 x5 x6 x6 x7 xn We often refer to a data set with an upper case letter like X, In which case the numbers in the data set are called elements (x1, x2, ..., xn) “n” or “N” is the number of elements or observations n x 1 i x 1 x 2 x 3 .......... .......... ... x n MH-Variance -Kuwait X Net deviations from the mean will always sum to zero x1 x2 x x3 x4 n (x x) 0 i 1 i So “total distance” from the mean is zero Because +ve and –ve contributions cancel MH-Variance -Kuwait Measures of data Spread • But we want a measure that will represent these net deviations somehow. • One way to ensure a non-zero result is to square each deviation before adding it. • We can then average these deviations by dividing by their Variance number “n” and use this compare data sets Units squared • OR, we can average and take the square root of the above Standard deviation Units of Units • This latter approach will have the same units as the underlying data. MH-Variance -Kuwait Calculate the Variance for the following data set This data relates to Measures of distance travelled to work in units of (miles) xi x (xi x) 2 10 -0.9 0.81 3.5 -7.4 54.76 27 16.1 259.21 12 1.1 1.21 2 -8.9 79.21 xi 395.2 σ 2 2 (x x ) i N 395.2 79.04 5 This is the population variance (miles2) σ 2 (x x ) i N 395.2 8.89 5 This is the population standard deviation (miles) Mean is 10.9 n=5 Units in miles MH-Variance -Kuwait Population Variance for Grouped Data Mi is calls midpoint our Xi Rent (€) 420-439 440-459 460-479 480-499 500-519 520-539 540-559 560-579 580-599 600-619 Total 208234.29 70 2 fi 8 17 12 8 7 4 2 4 2 6 70 Mi 429.5 449.5 469.5 489.5 509.5 529.5 549.5 569.5 589.5 609.5 Mi - x -63.7 -43.7 -23.7 -3.7 16.3 36.3 56.3 76.3 96.3 116.3 (M i - x )2 f i (M i - x )2 4058.96 32471.71 1910.56 32479.59 562.16 6745.97 13.76 110.11 265.36 1857.55 1316.96 5267.86 3168.56 6337.13 5820.16 23280.66 9271.76 18543.53 13523.36 81140.18 208234.29 208234.29 208234.29 2 s 69 70 MH-Variance -Kuwait s 208234.29 69 Variance for Grouped Data For sample data 2 f ( M x ) i i s2 n 1 For population data 2 f ( M ) i i 2 N Sample variance s2 is commonly referred to by σ2n-1 Sample Standard Deviation s is commonly referred to by σn-1 So why is the sample measure divided by (n-1) ? – will deal with this soon! MH-Variance -Kuwait Formulae RAW DATA Sample Variance (x s 2 i x) 2 s2 n -1 RAW DATA Population Variance 2 (x ) 2 2 i N GROUPED DATA Sample Variance n-1 x 2 i n( 2 ) N 2 2 x .f n( x ) i 2 i s n -1 (xi x) . fi s n -1 2 2 GROUPED DATA Population Variance ( xi ) . f i N 2 2 ) x n( x i 2 2 x .f n( ) i i 2 N 2 2 MH-Variance -Kuwait Things will now do 1- Understand why the following two formulas are the same and appreciate that the second form is much quicker to calculate than the first form s2 2 (x x ) i n -1 s 2 x 2 i n(x 2 ) n-1 2- I would like you think of calculating variance as s2 S xx or n-1 2 Where Sxx can be calculated in different ways x x 2 x 2 nx 2 and can be divided appropriately dependent on whether we have a sample or population 3- We should investigate why we average , S2 , by (n-1) when we are dealing with a sample We will deal with this third and unusual point next!! MH-Variance -Kuwait S xx n Why we divide by (n-1) Population v Sample v 2 v MH-Variance -Kuwait v We take a random sample from the population and use it to estimate σ2 We are trying to estimate the true population mean σ2 Population In the real world we take a sample and use it Sample s 2 s 2 I am going to show you that S2 will be the better estimator of the true population variance, σ2 MH-Variance -Kuwait 2 Taking Lots of Samples of fixed size n & Build distributions of S2 and σ2 2 2 2 32 1 2 s 1 s4 2 s5 2 5 2 n s 2 2 2 2 4 s s s i 1 n i 3 sn n 2 n 2 2 MH-Variance -Kuwait 2 s 2 i 1 n 2 2 i 2 Calculating s2 and σ2 of many samples , grouping and counting we can build distributions for s2 and σ2 σs2 S2 dist’n dist’n <σ2 σ2 RED distribution is centered around the real population variance MH-Variance -Kuwait Showing s 2 = σ2 AVG(S2) AVG(σs2) Row 1 Sample 1 S2 σs2 Row 2- Sample 2 S2 σs2 Row 3 Sample 3 S2 σs2 Row 4 Sample 4 Row 100 Sample 100 S2 σ S2 S2 σs2-Kuwait MH-Variance I will generate a Population of numbers And calculate the Pop Var (σ2) Then show that AVG(S2) = σ2 AVG(σs2) < σ2 Therefore E(S2)= σ2 Summary We have looked at the formula for calculating Variance and Its square root Std- Deviation We have noted that we average by n or n-1 depending on whether or not we are working with a sample or population We have noted that that we can write Sxx = x x 2 in different ways that are faster to calculate. We should work these different ways through shortly But first Some questions MH-Variance -Kuwait