ST361: Ch2.2 & 2.3 Numerical Summary Measures of Variability for Data Topics: Measures of variability (spread): o Deviation o Variance/standard deviation o Interquartile range Statistical definition of Outliers: the 1.5 x IQR criterion for outliers ----------------------------------------------------------------------------------------------------------Measures of Variability (Spread) (1) Deviation: the difference between the observation and the mean Deviation of x i , denoted as di , is ______________________ What is the mean deviation ( d ) ? 1 (2) Variance / Standard Deviation Sample Variance: the sample variance s 2 of n observations is ____ ________________________________________________________ n s2 x i 1 x 2 i n 1 ► An alternative form convenient for calculation s2 = Sample Standard Deviation (SD): the sample standard deviation, denoted by ___, is _______________________________________________________ n s x i 1 i x 2 n 1 Ex. Grades of 9 students on a HW assignment: 86,85,81,82,84,84,83,84,87. x 84 . What is the SD? (Also know that 9 x i 1 2 i 63532 ► Interpretation: Comments: 1. SD should be used as a measure of spread only when __________ is used as the measure of center 2. SD=0 implies ________________________________________________ Why? 3. Like mean, SD is strongly influenced by outliers Ex. If a coding error makes 87 to be 870, then x =182 and SD becomes 278… 2 (3) Interquartile Range (IQR) (a) Quartiles: To find quartiles: 1. Sort the data and divide data points into 2 halves (If there are odd numbers of observations, _________________________ in each half.) 2. Lower quartile Q1= _____________________________ 3. Upper quartile Q3 = ____________________________ (b) Inter-Quartile Range (IQR) IQR = Interpretation of IQR: Ex. Grades of 9 students on a HW assignment: 86,85,81,82,84,84,83,84,87. 3 Ex. Rainfall in NC in the past 15 months 1|0 2 | 25 3 | 45 4 | 11667 5 | 449 6|0 7| 8|2 Stem: one digit Leaf: tenths digit Find quartiles and the IQR Remark 1: The 1.5 x IQR Criterion for Outliers An observation is called an outlier if _____________________________ ___________________________________________________________ Extreme outlier indicates ____________________________________ _________________________________________________________ Ex. Use the 1.5xIQR rule to check if there is any outlier in the Rainfall dataset. 4 Remark 2: The 5-number summary Remark 3: Standard numerical summaries of a data set include sample size, center, and spread. For reasonably symmetric distribution with no outliers, use _________________ For the rest situation, use _________________________________________ Remark 4: Change of Unit 1. Adding (or subtracting) a constant to each observation WILL / WILL NOT (choose one) change the measures of spread, such as SD and IQR. Ex. If the new score = 60 + the old score; what is the spread of the new score? Cf. Adding/subtracting WILL / WILL NOT (choose one) change the measures of center and quartiles. 5 2. Multiplying each observation by a constant (denoted by a) will ___________ the measures of spread such as SD and IQR by ________. Conclusion: If new unit is aX + b, then the new spread is _______________________ ( Recall that the new center is ________________________ ) Ex. Temperatures read in Fahrenheit and the SD temperature is sF, and IQR is rF. What are the SD and IQR in Centigrade? Note that C F 32 5 . 9 6