Goals Notes Measurements of Variance §2.4 Goals: I Measure fluctionations. I Study the variance formula. I Discuss concentration of data similar to the center. Assignment I Homework §2.4: #7, #13, #25, #33, #39 Suggested Exercises: §2.4: #1, #3, #9, #10 Measuring Variation in Data Sets Notes Measurements of variation relate how strongly data entries vary from your choice of the measurement of center. I Range I Deviation from the mean I Variance and Standard Deviation In addition, help to identify unusually occuring data values, use I Emprical Rules for symmetrical bell-shapes. I Chebychev’s Theorem for other shapes. Deviations Notes You can compute the deviation from the mean for every data entry: Deviation of data entry = data entry − mean For a data set with mean = 24.5. Data Entry Deviation 20 -4.5 33 8.5 18 -6.5 24 -0.5 25 0.5 2.5 27 1. What is the total of all the deviations? 2. Is the mean of these deviations useful? Variance Formula Notes To overcome the problem of total deviation, we use sum of squares of deviations. SSx = Σ(x − µ)2 (for parameters) or SSx = Σ(x − x̄)2 (for statistics) The average of sum of squares is the variance. σ2 = Σ(x − µ)2 (for parameters) N or s2 = Σ(x − x̄)2 (for statistics) n−1 Using N vs n − 1 is for technical reasons. Standard Deviation Notes Taking the square root of the variance results in the standard deviation. I This is what is used instead of ”average deviation.” I This has the same units of measurement as the data entries. Steps to find the standard deviation: 1. Find the mean of the data set. 2. Find the deviation for each entry. 3. Square each deviation. 4. Add to get the sum of squares. 5. Divide (by either N or n − 1) to get the variance. 6. Find the square root of the variance to get the standard deviation. Empirical Rules Notes Special case: With symmetric and bell-shaped data, approximately I 68% of the data entries lie within one standard deviation away from the mean. I 95% of the data entries lie within two standard deviations away from the mean. I 99.7% of the data entries lie within three standard deviations away from the mean. For non-bell shaped and other shapes, a portion of at least 1− 1 k2 of the data entries lie within k standard deviations of the mean. This is a theorem by Chebychev.