Boxplots Pictorial method which is used to describe o center of data o amount of spread in data o symmetry or lack of symmetry in data o outliers in the data Definition: lower fourth and upper fourth o Sort n data points smallest to largest. o Find the median of the data, call it x o Divide the data into a lower half and an upper half. Include the median in each half o The median of the lower half is called the lower fourth o The median of the upper half is called the upper fourth Example: median Data 1 4 6 18 40 41 43 44 45 46 48 49 50 58 67 101 256 lower fourth upper fourth Definition: fourth spread The fourth spread of a data set, fs, is defined to be fs = upper fourh – lower fourth Example: For the data above fs = 50 – 40 = 10 With the above definitions in mind, one formal way to define outliers is as follows: o An observation farther than 1.5fs units from the closest fourth is an outlier o An observation farther than 3fs units from the closest fourth is an extreme outlier Example: Four the data above: o fs = 10, 1.5fs = 15 and 3fs= 30 o lower fourth = 40, upper fourth = 50 o 1,4,6 are extreme outliers—greater than 30 units from 40 (lower fourth) o 18 is an outlier it is more than 15 ( but not more than 30) units from 40 o 67 is an outlier – more than 15 units from 50 ( upper fourth) o 101 and 256 are extreme outliers Minitab’s Version using the data from the example: o Minitab shows only 101 and 256 as extreme outliers. o Minitab shows only 2 outliers. Notice, eliminating the outliers, the spread below the median is grater than above the median. 0 100 200 C1 Text: Example 1.20 Radon Concentration –linked to childhood cancers Data: Radon Concentration in households in which a child has been diagnosed with cancer. (Measured in Bq/m3) Data : (stem and leaf) WITH WITHOUT Stem-and-leaf of With Leaf Unit = 1.0 1 7 17 (10) 15 8 7 5 0 0 1 1 2 2 3 3 HI N = 42 Stem-and-leaf of Without Leaf Unit = 1.0 3 567899 0001111233 5556667888 0112233 7 34 8 39, 45, 2 14 (9) 16 14 10 6 5 57, 210, 0 0 1 1 2 2 3 3 33 566777889999 111112234 77 1144 9999 3 89 HI 55, 55, 85, N = 39 Descriptive Statistics: Descriptive Statistics: With, Without Variable With Without N 42 39 Mean 22.81 19.15 Median 16.00 12.00 TrMean 17.97 17.17 Variable With Without Minimum 3.00 3.00 Maximum 210.00 85.00 Q1 10.75 8.00 Q3 22.25 29.00 StDev 31.66 16.99 SE Mean 4.88 2.72 With 200 100 0 With Without Note: o Both the mean and medians indicate or suggest that the concentration of radon in homes with diagnosed cancer is greater than those without cancer o The mean of the group with cancer is affected by the very extreme outlier , 210. The trimmed means are very close. o The standard deviations that there is more variation in the group with cancer. However, the fourth spread of the group without is greater than the fourth spread of the group with. Here the standard deviation was affected by the extreme outlier( 210) in the group with cancer