Lesson Objectives Learn what percentiles are and how to calculate quartiles. Learn to find the five number summary. Learn how to construct and use Boxplots. Department of ISM, University of Alabama, 1995-2003 M08-Numerical Summaries 2 1 Sample If x = the th 100p th 100p percentile: percentile, then at least 100p% of data is x, at least 100(1-p)% of data is x. Example: You are told you scored 47; then you hear “47” is at the 82nd percentile. 82% of the sample have scores 47, AND 18% have scores 47. Department of ISM, University of Alabama, 1995-2003 M08-Numerical Summaries 2 2 Finding th 100p 1. Order the data. 2. Calculate np. 3a. If np is NOT an integer, round up; find the obs. in this position. Department of ISM, University of Alabama, 1995-2003 percentile: n = 25, p = 1/3 np = 8.333, 9th position will be the 33.333 %tile. M08-Numerical Summaries 2 3 Finding th 100p 1. Order the data. 2. Calculate np. 3b. If np IS an integer, say k, then avg the kth and (k+1)th ordered values. Department of ISM, University of Alabama, 1995-2003 percentile: n = 25, p = .40 np = _____ , average of ______ & ____ positions will be the 40th %tile. M08-Numerical Summaries 2 4 Five Number Summary 1. 2. 3. 4. 5. Maximum 3rd Quartile, Q3 = 75th p’tile Median 1st Quartile, Q1 = 25th p’tile Minimum Department of ISM, University of Alabama, 1995-2003 M08-Numerical Summaries 2 5 Quartiles: 1st Quartile (25th percentile) : at least 25% of the data values lie at or below it. 3rd Quartile (75th percentile) : at least 75% of the data values lie at or below it. Department of ISM, University of Alabama, 1995-2003 M08-Numerical Summaries 2 6 Method 1: Percentile method Q1 located at position (n+1)*1/4 Q2 located at position (n+1)*2/4 Q3 located at position (n+1)*3/4 n Q1 Q2 Q3 5 8 11 Department of ISM, University of Alabama, 1995-2003 M08-Numerical Summaries 2 7 Example 6 Step 1: Order the data: 12, 14, 16, 18, 19, 21, 22, 25, 27 Max = Q3 = Median = Q1 = Min = Department of ISM, University of Alabama, 1995-2003 M08-Numerical Summaries 2 8 Method 2: Median method Q1 = median of observations below the median’s position. Q3 = median of observations above the median’s position. Department of ISM, University of Alabama, 1995-2003 M08-Numerical Summaries 2 9 Example 6 Ordered data: 12, 14, 16, 18, 19, 21, 22, 25, 27 Max = Q3 = Median = Q1 = Min = Department of ISM, University of Alabama, 1995-2003 M08-Numerical Summaries 2 10 4. Interquartile Range (IQR) IQR = Q 3 - Q 1 IQR is the range of the middle 50% of the data. Observations more than 1.5 IQR’s beyond quartiles are considered outliers. Department of ISM, University of Alabama, 1995-2003 M08-Numerical Summaries 2 11 Which summary statistics should I use? Shape? Location? Variation? Department of ISM, University of Alabama, 1995-2003 M08-Numerical Summaries 2 12 Boxplot A graphically display of the five number summary (also called a box-and-whiskers plot) Department of ISM, University of Alabama, 1995-2003 M08-Numerical Summaries 2 13 Example 6 Ordered data: 12, 14, 16, 18, 19, 21, 22, 25, 27 Q3 = 23.5 Q1 = 15.0 Max = Q3 = Median = Q1 = Min = Department of ISM, University of Alabama, 1995-2003 27.0 23.5 19.0 15.0 12.0 IQR = 8.5 M08-Numerical Summaries 2 14 Example 6A Ordered data: What if . . . . Example 6B Ordered data: What if . . . . 19, 19, 19, 12, 14, 16, 18, 19, 21, 22, 25, 27 X 12, 14, 16, 18, 19, 21, 22, 25, 27 Department of ISM, University of Alabama, 1995-2003 M08-Numerical Summaries 2 15 28 26 24 22 20 18 Max = Q3 = Median = Q1 = Min = 27.0 23.5 19.0 IQR = 8.5 15.0 12.0 Note: Middle 50% of data are within the range of the box 16 14 12 Department of ISM, University of Alabama, 1995-2003 M08-Numerical Summaries 2 16 Use side-by-side boxplots to display two variables when one is quantitative, and one is categorical. Useful tool for comparing distributions. Department of ISM, University of Alabama, 1995-2003 M08-Numerical Summaries 2 17 Part Suppliers; who is best? 15.040 15.020 15.000 14.980 14.960 A B Department of ISM, University of Alabama, 1995-2003 C M08-Numerical Summaries 2 18 Modified Boxplot More accurate picture of data. Useful in detecting outliers: Observations more than 1.5 IQR’s beyond quartiles are considered outliers. Available in Minitab (boxplot); not in Excel. Department of ISM, University of Alabama, 1995-2003 M08-Numerical Summaries 2 19 Example 7 13, 24, 26, 26, 27, 28, 36, 46 25.0 Maximum = 3rd Quartile = Median = 1st Quartile = Minimum = 26.5 46.0 32.0 26.5 25.0 13.0 32.0 IQR = 7.0 1.5 IQR = 1.5 • 7.0 = 10.5 Department of ISM, University of Alabama, 1995-2003 M08-Numerical Summaries 2 20 48 Data: 13, 24, 26, 26, 27, 28, 36, 46 * 44 40 36 Q 3 + 1.5 • IQR = 42.5 1.5•IQR 32 Q3 = 32.0 28 Q1 = 25.0 24 20 1.5•IQR 16 12 Note: Whiskers go to the most extreme value within the limits, not to the limits. * Q1 - 1.5 • IQR = 14.5 Department of ISM, University of Alabama, 1995-2003 M08-Numerical Summaries 2 21 48 44 Data: 13, 24, 26, 26, 27, 28, 36, 46 * Finished Box Plot 40 36 32 28 24 20 16 12 * Department of ISM, University of Alabama, 1995-2003 M08-Numerical Summaries 2 22 Formula Sheet Example Box Plot: Min Q1 M 1.5 IQR 1.5 IQR Modified Box Plot: Q1 -1.5 IQR Note: For this problem, no data are below the lower “outlier limit”. Max Q3 Lines extend to the smallest & largest obs. inside of limits. Q3 +1.5 IQR Plot each obs. that is beyond the “outlier limits” on each end. Match each of the following descriptions to one of the following histograms. 1. Scores on an EASY Math exam. 2. Heights of a group of students. 3. Number of medals won by medal winning countries in the 1996 Winter Olympics. 4. SAT scores for some college students. 5. Last digit in SSN for 100 people. Department of ISM, University of Alabama, 1995-2003 M08-Numerical Summaries 2 24 Match descriptions to a Histograms. 1. Scores on an EASY Math exam. 2. Heights of a group of students. 3. Number of medals won by medal winning countries in the 1996 Winter Olympics. 4. SAT scores for some college students. 5. Last digit in SSN for 100 people. B D Department of ISM, University of Alabama, 1995-2003 A C E M08-Numerical Summaries 2 25 Match each of the following Boxplots (1,2,3,4,5) to one of the Histograms (A-E) above. Department of ISM, University of Alabama, 1995-2003 M08-Numerical Summaries 2 26 BoxPlots for Schaeffer Examples 1 2 3 4 5 Department of ISM, University of Alabama, 1995-2003 M08-Numerical Summaries 2 27 Descriptive Statistics Variable A B C D E N 100 100 100 100 100 Mean 50.6 49.9 49.9 54.1 50.4 Department of ISM, University of Alabama, 1995-2003 Median 51.0 50.1 50.6 32.9 49.8 Range 20.0 42.6 12.9 415.4 32.9 M08-Numerical Summaries 2 28 Descriptive Statistics N 100 100 100 100 100 Mean 50.6 49.9 49.9 54.1 50.4 Median 51.0 50.1 50.6 32.9 49.8 Range 20.0 42.6 12.9 415.4 32.9 30 1 20 Frequency Variable A B C D E 10 0 0 2 10 200 300 400 D 30 4 5 20 Frequency 0 30 40 50 60 10 70 B 15 10 100 0 3 40 45 50 C 15 Frequency 5 0 5 10 5 40 42 44 46 48 50 52 54 56 58 60 A Department of ISM, University of Alabama, 1995-2003 M08-Numerical Summaries 2 29 0 40 50 60 70 E D C B A 0 100 Department of ISM, University of Alabama, 1995-2003 200 A 300 M08-Numerical Summaries 2 30