Part II Sigma Freud & Descriptive Statistics Chapter 3 Viva La Difference: Understanding Variability What you will learn in Chapter 3 • Variability is valuable as a descriptive tool • Difference between variance & standard • deviation How to compute: • Range • Inter-quartile Range • Standard Deviation • Variance Why Variability is Important Variability • Spread • Dispersion What is the “score” of interest here? • how different scores are from one particular score • Ah ha!! It’s the MEAN!! So…variability is really a measure of how each score in a group of scores differs from the mean of that set of scores. Measures of Variability Four types of variability that examine the amount of spread or dispersion in a group of scores… • Range • Inter-quartile Range • Standard Deviation • Variance Typically report the average and the variability together to describe a distribution. Computing the Range Range is the most “general” estimate of variability… Two types… • Exclusive Range •R=h-l • Inclusive Range •R=h–l+1 (Note: R is the range, h is the highest score, l is the lowest score) Measures of variation Range Range • The difference between the highest and lowest numbers in a set of numbers. 2, 35, 77, 93, 120, 540 540 – 2 = 538 Chapter 3 6 Measures of variation Range What is the range of: 2, 3, 3, 3, 4, 5, 6, 6, 7, 9, 11, 13, 15, 15, 15, 16 24, 57, 81, 96, 107, 152, 179, 211 1001, 1467, 1479, 1680, 1134 Chapter 3 7 Interquartile range Difference between upper (third) and lower (first) quartiles Quartiles divide data into four equal groups • Lower (first) quartile is 25th percentile • Middle (second) quartile is 50th percentile and • is the median Upper (third) quartile is 75th percentile Calculating the interquartile range for high temperatures Date 7-Jan 8-Jan 6-Jan 10-Jan 5-Jan 4-Jan 9-Jan 11-Jan 2-Jan 3-Jan High Temperature 32 32 35 41 42 43 46 52 59 60 <===Bottom Half Middle Value = First Quartile = 35 <===Middle Value <===Middle Value Median = Second Quartile = 42.5 <===Top Half Middle Value = Third Quartile = 52 interquartile range = 52 – 35 = 17 Stem and Leaf 0730 Q1 Fall 2010 (N=22) 2|349 3|03344555666677779 4|01 Q1= .25 (22)=5.5 data point round up to 6th data point=value of 33 Q2= n+1/2=23/2=11.5 = avg of 11th and 12th data pt = 35.5 Q3= .75(22)=16.5 =round up to17th data point= Value of 37 Chapter 3 10 Interquartile range and outliers Value can be considered to be an outlier if it falls more than 1.5 times the interquartile range above the upper quartile or more than 1.5 times the range below the lower quartile Example for high temperatures • Interquartile range is 17 • 1.5 times interquartile range is 25.5 • Outliers would be values • Above 52 + 25.5 = 77.5 (none) • Below 35 – 25.5 = 9.5 (none) Review: Steps to Quartiles, Interquartile Range,and Checking for Outliers 1) Put values in ascending OR descending order 2) Multiply .25 (n) for Q1 3) Multiply .75 (n) for Q3 4) Q3 - Q1 = IQR 5) Q1 – 1.5 (IQR)= value below smallest value in data set; 6) Q3 + 1.5 (IQR)= value above largest value in data set; Let’s practice Finding Outliers What is the median, Q1, Q3, range, and IQR for the following? Then check for outliers. 10, 25, 35, 65, 100, 255, 350, 395 (n=8) 10, 65, 75, 99, 299 (n=5) 5, 39, 45, 59, 64, 74 (n=6) Chapter 3 13 Computing Standard Deviation Standard Deviation (SD) is the most frequently reported measure of variability SD = average amount of variability in a set of scores What do these symbols represent? Why n – 1? The standard deviation is intended to be an estimate of the POPULATION standard deviation… • We want it to be an “unbiased estimate” • Subtracting 1 from n artificially inflates the SD…making it larger In other words…we want to be “conservative” in our estimate of the population Things to Remember… Standard deviation is computed as the average distance from the mean The larger the standard deviation the greater the variability Like the mean…standard deviation is sensitive to extreme scores If s = 0, then there is no variability among scores…they must all be the same value. Computing Variance Variance = standard deviation squared So…what do these symbols represent? Does the formula look familiar? Standard Deviation or Variance While the formulas are quite similar…the two are also quite different. • Standard deviation is stated in original units • Variance is stated in units that are squared • Which do you think is easier to interpret??? Same mean, different standard deviation; Sample variance and Sample standard deviation: {20,31,50,69,80} Each number x1 Mean Distance from Mean 20 50 -30 31 50 -19 50 50 0 69 50 19 80 50 30 Chapter 3 19 Then square each distance from mean and add together… (-30)2 + (-19)2 + (0)2+ (19)2 + (30)2 900+ 361+ 0+ 361 +900= 2522 Divide by N-1 (N=5) 2522/4=630.5= Sample Variance To find sample standard deviation, take square root of variance= 25.11 Chapter 3 20 Same mean, different standard deviation: {39,44,50,56,61} Each number x1 Mean Distance from Mean 39 50 -11 44 50 -6 50 50 0 56 50 6 61 50 11 Chapter 3 21 Which data set has more variability? (-11)2 + (-6)2 + (0)2 + (11)2 + (6)2 121+ 36+ 0+ 121+ 36= 314 Divide by N-1 gives us sample variance 314/4=78.5 Square root of 78.5 gives us sample standard deviation=8.86 Chapter 3 22 Measures of variation Standard deviation How about a more user-friendly equation? x x N 2 2 S Chapter 3 N 1 23 Using Excel’s VAR Function Using the Computer to Compute Measures of Variability Glossary Terms to Know Variability • Range • Standard deviation • Mean deviation • Unbiased estimate • Variance