Math 3680 Lecture #2 Mean and Standard Deviation Mean vs. Median Example: In a certain class of 13 students, 10 showed up the first exam, while 3 blew it off: Here are the grades; in order: 0 0 0 55 68 78 79 81 84 87 93 94 98 (A) Calculate the class median. (i) Include all students. (ii) Ignore the students who slept in. (B) Calculate the class mean (average). (i) Include all students. (ii) Ignore the students who slept in. Definition: Sample mean. For a data set of size n, the sample mean is n xi x i 1 n Definition: Population mean. For a finite population of size N, the population mean is N xi i 1 N 0 0 0 55 68 78 79 81 84 87 93 94 98 Example: Suppose the student who got a 55 instead got a 15. Would the median change? Would the mean? Example: Suppose the 98 is replaced by 980. Would the median change? Would the mean? By how much? Note: The mean is much more sensitive to wild outliers than the median. Exercise: For registered students at universities in the U.S., which is larger: average age or median age? Repeat for the heights of 12-year-olds. Repeat for the weights of 12-year-olds. Repeat for the scores on a college final exam. Like the median, the mean only captures central behavior and does not contain information about the spread of the data. Physical interpretation of the mean: a “balance.” Physical interpretation of the median: half the area lies on each side. We have just explored the ideas of mean (average), median and mode. These measurements are useful in providing succinct numerical representations for measures of central tendencies. Exercise: Two different groups of 10 students are given identical quizzes with the following results. Compute the mean, median, and mode. Group A 65 66 67 68 71 73 74 77 77 77 Group B 42 54 58 62 67 77 77 85 93 100 Standard Deviation Definition: Sample Standard Deviation. For a data set of size n, the sample standard deviation is n 1 2 s ( xi x) n 1 i 1 1. 2. 3. Square all of the deviations from average. Sum the squares, then divide by n - 1 (the degrees of freedom). Take the square root of the result of step 2. Intuition: The standard deviation gives a measure of how “spread out” the data is. Exercise: For each list below, find x and s: (i) (ii) (iii) 1, 4, 6, 7, 8, 10 5, 8, 10, 11, 12, 14 3, 12, 18, 21, 24, 30 Example: Each of the following lists has an average of 50. For which one is the SD of the numbers the biggest? Smallest? 0, 20, 40, 50, 60, 80, 100 0, 48, 49, 50, 51, 52, 100 0, 1, 2, 50, 98, 99, 100 Example: For a list of positive numbers, can the SD ever be larger than the average? For large data sets, Microsoft Excel can compute the mean and standard deviation. www.math.unt.edu/~allaart/3680/governors.xls =AVERAGE(A1:E10) 70000 77028 85000 85000 85506 85776 90000 93089 93600 94532 94780 95000 95000 95000 95389 96361 98331 98500 100600 102704 105000 105194 106078 107482 110000 110298 115345 117000 120087 120303 121391 122160 124575 124855 125130 126485 127303 131768 132500 133162 135000 144416 145000 145132 150000 154800 175000 175000 177000 179000 =STDEV(A1:E10) Average = SD = $ 115,953.20 $ 26,810.27 1) The SD says how far away numbers on a list are from their average. Most entries on the list will be somewhere around one SD away from the average. Very few will be more than two or three SDs away. 2) Roughly 68% of the values will be within one SD of the average, and 95% will be within two SDs. (This is only a rule of thumb!) 70000 94780 105000 77028 95000 105194 85000 95000 106078 85000 95000 107482 85506 95389 110000 85776 96361 110298 90000 98331 115345 93089 98500 117000 93600 100600 120087 94532 102704 120303 121391 122160 124575 124855 125130 126485 127303 131768 132500 133162 135000 144416 145000 145132 150000 154800 175000 175000 177000 179000 Average = SD = $ 115,953.20 $ 26,810.27 Average - 2 SD = Average - 1 SD = Average = Average + 1 SD = Average + 2 SD = $ 62,332.66 $ 89,142.93 $ 115,953.20 $ 142,763.47 $ 169,573.74 Example: Estimate the mean of the high temperatures recorded in Denton over the past 30 days. Then estimate the standard deviation. Definition: Population standard deviation. N 1 2 ( xi x) N i 1 This formula should be used in the (rare) occasion that the entire population is known, not a sample. Definition: Sample variance: n 1 2 s ( xi x) n 1 i 1 2 Definition: Population variance: 1 N 2 N ( x x) i 1 i 2 Grouped Data Grouped Data Find and for the age of the population under 50% of the poverty threshold. Q: Why aren’t we finding x and s? Q: Will our answer be exact? To handle grouped data, we pretend that all members of each class are located at the midpoint (called the mark). 0 18 25 35 45 55 60 65 18 25 35 45 55 60 65 85 9 21.5 30 40 50 57.5 62.5 75 5561000 2507000 2155000 1792000 1540000 614000 537000 932000 To handle grouped data, we pretend that all members of each class are located at the midpoint (called the mark). Now compute the mean and standard deviation: Definition: Grouped mean: m 1 x f i xi n i 1 where m = number of groups Definition: Grouped sample variance: m 1 2 s f i ( xi x) n 1 i 1 2 Definition: Grouped Population variance: 1 N 2 m f ( x x) i 1 i i 2