Chapter 2 Describing Distributions with Numbers BPS - 3rd Ed. Chapter 2 1 Numerical Summaries Center of distribution – mean – median Spread of distribution – five-point summary (& interquartile range) – standard deviation (& variance) BPS - 3rd Ed. Chapter 2 2 Mean (Arithmetic Average) Traditional measure of center Notation (“xbar”): x Sum the values and divide by the sample size (n) n 1 1 x x1 x 2 xn xi n n i 1 BPS - 3rd Ed. Chapter 2 3 Mean Illustrative Example: “Metabolic Rate” Data: Metabolic rates, 7 men (cal/day) : 1792 1666 1362 1614 1460 1867 1439 1792 1666 1362 1614 1460 1867 1439 x 7 11,200 7 1600 BPS - 3rd Ed. Chapter 2 4 Median (M) Half of the ordered values are less than or equal to the median value Half of the ordered values are greater than or equal to the median value If n is odd, the median is the middle ordered value If n is even, the median is the average of the two middle ordered values BPS - 3rd Ed. Chapter 2 5 Median Example 1 data: 2 4 6 Median (M) = 4 Example 2 data: 2 4 6 8 Median = 5 (average of 4 and 6) Example 3 data: 6 2 4 Median 2 (order the values: 2 4 6 , so Median = 4) BPS - 3rd Ed. Chapter 2 6 Location of the Median L(M) Location of the median: L(M) = (n+1)÷2 , where n = sample size. Example: If 25 data values are recorded, the Median is located at position (25+1)/2 = 13 in ordered array. BPS - 3rd Ed. Chapter 2 7 Median Illustrative Example Data: Metabolic rates, n = 7: 1792 1666 1362 1614 1460 1867 1439 L(M) = (7 + 1) / 2 = 4 Ordered array: 1362 1439 1460 1614 1666 1792 1867 median Value of median = 1614 BPS - 3rd Ed. Chapter 2 8 Comparing the Mean & Median Mean = median when data are symmetrical Mean median when data skewed or have outlier (mean ‘pulled’ toward tail) while the median is more resistant If we switch this: 1362 1439 1460 1614 1666 1792 1867 to this: 1362 1439 1460 1614 1666 1792 9867 the median is still 1614 but the mean goes from 1600 to 2742.9 BPS - 3rd Ed. Chapter 2 9 Question The average salary at a high tech company is $250K / year The median salary is $60K. How can this be? Answer: There are some very highly paid executives, but most of the workers make modest salaries BPS - 3rd Ed. Chapter 2 10 Spread = Variability the amount values spread above and below the center Variability Can be measured in several ways: – range (rarely used) – 5-point summary & inter-quartile range – variance and standard deviation BPS - 3rd Ed. Chapter 2 11 Range Based on smallest (minimum) and largest (maximum) values in the data set: Range = max min The range is not a reliable measure of spread (affected by outliers, biased) BPS - 3rd Ed. Chapter 2 12 Quartiles Three numbers which divide the ordered data into four equal sized groups. Q1 has 25% of the data below it. Q2 has 50% of the data below it. (Median) Q3 has 75% of the data below it. BPS - 3rd Ed. Chapter 2 13 Obtaining the Quartiles Order the data. Find the median – This is Q2 Look at the lower half of the data (those below the median) – The “median” of this lower half = Q1 Look at the upper half of the data – The “median” of this upper half = Q3 BPS - 3rd Ed. Chapter 2 14 Illustrative example: 10 ages AGE (years) values, ordered array (n = 10): 05 11 21 24 27 | 28 30 42 50 52 Q1 Q2 Q3 Q1 = 21 Q2 = average of 27 and 28 = 27.5 Q3 = 42 BPS - 3rd Ed. Chapter 2 15 Weight Data: Sorted n = 53 Median: L(M)=(53+1)/2=27 placing it at 165 L(Q1)=(26+1)/2=13.5 placing it between 127 and 128 (127.5) L(Q3) = 13.5 from the top placing it between 185 and 185 100 101 106 106 110 110 119 120 120 123 124 125 127 128 130 130 133 135 139 140 Q1 = 127.5 BPS - 3rd Ed. 148 150 150 152 155 157 165 165 165 170 170 170 172 175 175 180 180 180 180 185 Q2 = 165 Chapter 2 185 185 186 187 192 194 195 203 210 212 215 220 260 Q3 = 185 16 Weight Data: Quartiles Q1 = 127.5 Q2 = 165 Q3 = 185 BPS - 3rd Ed. Chapter 2 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 0166 009 0034578 00359 08 00257 555 000255 000055567 245 3 025 0 0 17 Five-Number Summary minimum = 100 Q1 = 127.5 M = 165 Q3 = 185 maximum = 260 Interquartile Range (IQR) = Q3 Q1 = 57.5 IQR gives spread of middle 50% of the data BPS - 3rd Ed. Chapter 2 18 Boxplot Central A line box spans Q1 and Q3. in the box marks the median M. Lines extend from the box out to the minimum and maximum. BPS - 3rd Ed. Chapter 2 19 Weight Data: Boxplot min 100 Q1 125 M 150 Q3 175 max 200 225 250 275 Weight BPS - 3rd Ed. Chapter 2 20 Quartile extrapolation Quartile divides data set into 4 segment: bottom, bottom middle, top middle, upper With small data sets extrapolate values Illustrative data: 2, 4, 6, 8 2 | Q1 4 | Q2 6 | Q3 8 Q1 = average of 2 and 4, which is 3 Q2 = average of 4 and 5, which is 5 Q3 = average of 6 and 8, which is 7 BPS - 3rd Ed. Chapter 2 21 Boxplots useful for comparing two groups (text p. 39) BPS - 3rd Ed. Chapter 2 22 Variances & Standard Deviation The most common measures of spread Based on deviations around the mean Each data value has a deviation, defined as xi x BPS - 3rd Ed. Chapter 2 23 Fig 2.3: Metabolic Rate for 7 men, with their mean (*) and two deviations shown BPS - 3rd Ed. Chapter 2 24 Variance Find the mean Find the deviation of each value Square the deviations Sum the squared deviations: we call this the sum of squares, or SS Divide the SS by n-1 (gives typical squared deviation from mean) BPS - 3rd Ed. Chapter 2 25 Variance Formula n 1 2 2 s ( xi x ) (n 1) i 1 BPS - 3rd Ed. Chapter 2 26 Standard Deviation Square root of the variance s s BPS - 3rd Ed. 2 n 1 2 ( xi x ) (n 1) i 1 Chapter 2 27 Variance and Standard Deviation Illustrative Example Data: Metabolic rates, 7 men (cal/day) : 1792 1666 1362 1614 1460 1867 1439 1792 1666 1362 1614 1460 1867 1439 x 7 11,200 7 1600 BPS - 3rd Ed. Chapter 2 28 Variance and Standard Deviation Illustrative Example (cont.) Observations Deviations Squared deviations xi x xi xi x 1792 17921600 = 192 1666 1666 1600 = 1362 1362 1600 = -238 1614 1614 1600 = 1460 1460 1600 = -140 (-140)2 = 19,600 1867 1867 1600 = 267 (267)2 = 71,289 1439 1439 1600 = -161 (-161)2 = 25,921 sum = SS = 214,870 BPS - 3rd Ed. 2 66 14 0 Chapter 2 (192)2 = 36,864 (66)2 = 4,356 (-238)2 = 56,644 (14)2 = 196 29 Variance and Standard Deviation Illustrative Example (cont.) 214,870 s 35,811.67 7 1 2 s 35,811.67 189.24 calories Notes: (1) Use standard deviation s for descriptive purposes (2) Variance & standard deviation calculated by calculator or computer in practice BPS - 3rd Ed. Chapter 2 30 Summary Statistics Two main measures of central location – Mean ( x ) – Median (M) Two main measures of spread – Standard deviation (s) – 5-point summary (interquartile range) BPS - 3rd Ed. Chapter 2 31 Choosing Summary Statistics Use the mean and standard deviation for reasonably symmetric distributions that are free of outliers. Use the median and IQR (or 5-point summary) when data are skewed or when outliers are present. BPS - 3rd Ed. Chapter 2 32 Example: Number of Books Read 0 0 0 0 0 0 0 0 0 1 BPS - 3rd Ed. 1 1 1 1 2 2 2 2 2 2 L(M)=(52+1)/2=26.5 2 4 2 4 2 4 3 5 3 5 3 5 M 3 5 4 5 4 5 4 6 Chapter 2 10 10 12 13 14 14 15 15 20 20 30 99 33 Illustrative example: “Books read” 5-point summary: 0, 1, 3, 5.5, 99 Note highly asymmetric distribution 0 10 20 30 40 50 60 Number of books 70 80 90 100 “xbar” = 7.06 s = 14.43 The mean and standard deviation give false impression with asymmetric data BPS - 3rd Ed. Chapter 2 34