Elementary Statistics Central Tendency John M Dusel jdusel@whittier.edu Whittier College Fall 2015 John M Dusel (Whittier College) Central Tendency Fall 2015 1 / 18 1 Measures of Central Tendency The Mean Characteristics of the mean The Median The Mode 2 Finding Central Tendency of Simple Frequency Distributions Mean Median Mode 3 When to use the Mean, Median, or Mode Scales of Measurement Skewed Distributions John M Dusel (Whittier College) Central Tendency Fall 2015 2 / 18 Measures of Central Tendency The distribution of a random variable X comprises information about the values of X and their frequencies. Frequency distributions and graphs show the form of a distribution. Definition (Measure of Central Tendency) Descriptive statistic that indicates a typical or representative score of a distribution. A value that in some sense summarizes a distribution. If a population is not available, then a parameter cannot be calculated. Researchers use statistics based on sample data. Good statistics mimic parameters. Measures of central tendency are good substitutes for their corresponding parameters. John M Dusel (Whittier College) Central Tendency Fall 2015 3 / 18 Measures of Central Tendency The Mean Definition (Mean of a variable X ) The mean of a sample of X ’s values is denoted X . Calculate X̄ using the arithmetic average. Different samples =⇒ different X (uncertainty). The mean of the population X measures is denoted µ. Need all population scores to calculate. No uncertainty. Formula for the sample mean P X = X N P : addition symbol (Greek letter S for “sum”) P X : a score =⇒ X means add all the X s N: sample size John M Dusel (Whittier College) Central Tendency Fall 2015 4 / 18 Measures of Central Tendency The Mean Example College student goes to the Student Center everyday and spends money. How much money should (s)he budget per month to pay for this habit? Data from two weeks X = $ spent on a given day. Day 1 2 3 4 5 6 7 X = P X N = $37.66 14 X 3.25 2.50 4.47 0.00 3.81 1.75 0.00 Day 8 9 10 11 12 13 14 X 0.00 6.78 2.40 0.00 0.00 8.50 4.20 = $2.69. Interpretation in terms of sample days. Therefore µ ≈ $2.69 (Exercise: Interpret. Hint: Population=?) Monthly estimated expenses = 30 · X = 30days · $2.69/day = $80.70. John M Dusel (Whittier College) Central Tendency Fall 2015 5 / 18 Measures of Central Tendency The Mean Special properties of the mean 1 X is the “balance point” of X ’s distribution: Subtract X from each score in X ’s distribution, and the sum will always vanish: X (X − X ) = 0 2 X is a “least squares” minimizer: X X (X − Y )2 for any number Y (X − X )2 < John M Dusel (Whittier College) Central Tendency Fall 2015 6 / 18 Measures of Central Tendency The Mean Derivations of the mean’s special properties 1 Property P (X − X ) = 0: Observe that X 2 (X − X ) = X X − NX = X P X −N · X =0 N P P Property (X − X )2 < (X − Y )2 for any number Y : P (X − Y )2 is quadratic in Y , specifically X X X (X − Y )2 = X2 − 2 X Y + NY 2 d Calculus: minimizing value of Y is the solution to dY X −2 X + 2NY = 0 X NY = X P X Y = =X N John M Dusel (Whittier College) Central Tendency P (X − Y )2 = 0 Fall 2015 7 / 18 Measures of Central Tendency The Median Definition (Median of a random variable) The point that divides a distribution of scores in half. Median is located at position N+1 2 in ordered arrangement. N N (N even: avg scores 2 , 2 + 1) Example (Student Center expenditure data) Day 13 9 3 $ 8.50 6.78 4.47 Day 14 5 1 $ 4.20 3.81 3.25 Day 2 10 6 $ 2.50 2.40 1.75 Day 4 7 8 $ 0.00 0.00 0.00 Day 11 12 $ 0.00 0.00 8.50, 6.78, 4.47, 4.20, 3.81, 3.25, 2.50, 2.40, 1.75, 0.00, 0.00, 0.00, 0.00, 0.00 | {z }| {z } 7 scores 7 scores N = 14 =⇒ average 7th,8th scores: Median = John M Dusel (Whittier College) Central Tendency $2.50+$2.40 2 = $2.45 Fall 2015 8 / 18 Measures of Central Tendency The Median Definition (Median of a random variable) The point that divides a distribution of scores in half. Median is located at position N+1 2 in ordered arrangement. N N (N even: avg scores 2 , 2 + 1) Example (Student Center expenditure data) N = 13 =⇒ use 7th score. One $0.00 dropped: Median = $2.50. 8.50, 6.78, 4.47, 4.20, 3.81, 3.25, 2.50, 2.40, 1.75, 0.00, 0.00, 0.00, 0.00 | {z } | {z } 6 scores 6 scores Score $8.50 dropped: Median = $2.40. 6.78, 4.47, 4.20, 3.81, 3.25, 2.50, 2.40, 1.75, 0.00, 0.00, 0.00, 0.00, 0.00 | | {z } {z } 6 scores John M Dusel (Whittier College) 6 scores Central Tendency Fall 2015 9 / 18 Measures of Central Tendency The Mode Definition (Mode) Most frequently occurring score. Relative frequency mode N . Example (Student Center expenditure data) Day 13 9 3 $ 8.50 6.78 4.47 Day 14 5 1 $ 4.20 3.81 3.25 Day 2 10 6 $ 2.50 2.40 1.75 Day 4 7 8 $ 0.00 0.00 0.00 Day 11 12 $ 0.00 0.00 Mode = $0.00 occurs 5/14 = 36% of the time. John M Dusel (Whittier College) Central Tendency Fall 2015 10 / 18 Measures of Central Tendency The Mode Definition (Measures of Central Tendency) Mean µ or X = P X N . Median is located at position N+1 2 in ordered arrangement. N N (N even: avg scores 2 , 2 + 1) Mode: most frequently occurring score. Relative frequency John M Dusel (Whittier College) Central Tendency mode N . Fall 2015 11 / 18 Finding Central Tendency of Simple Frequency Distributions Mean P Mean of a variable X is µ or P X = NX (N = sample size, X = scores). Frequency distribution: X = NfX , f = frequency of a score X . Example (Raw test scores of a group of N = 24 college students) X 97 94 93 89 86 f 1 2 1 1 2 f ·X 97 188 93 89 172 X 85 83 82 79 78 f 2 1 3 1 1 f ·X 170 83 246 79 78 X 77 75 71 70 68 f 1 1 1 1 1 f ·X 77 75 71 70 68 X 66 60 57 50 f 1 1 1 1 f ·X 66 60 57 50 P fX = 1889 = 97+188+93+89+172+170+83+246+79+· · ·+57+50 =⇒ X = 1889 24 = 78.71 Population = students in this class: µ = 78.71. Population = all college students: µ unknown, but µ ≈ 78.71. John M Dusel (Whittier College) Central Tendency Fall 2015 12 / 18 Finding Central Tendency of Simple Frequency Distributions Median is located at position Median N+1 2 . Example (Raw test scores of a group of N = 24 college students) X 97 94 93 89 86 f 1 2 1 1 2 X 85 83 82 79 78 f 2 1 3 1 1 X 77 75 71 70 68 f 1 1 1 1 1 X 66 60 57 50 f 1 1 1 1 Median located at position “25/2” = average of positions 12, 13. Count frequencies until positions 12, 13 are reached. 97, 94, 94, 93, 89, 86, 86, 85, 85, 83, 82, |{z} 82 , |{z} 82 #12 #13 Median = 82. John M Dusel (Whittier College) Central Tendency Fall 2015 13 / 18 Finding Central Tendency of Simple Frequency Distributions Mode Modal score: highest frequency Example (Raw test scores of a group of N = 24 college students) X 97 94 93 89 86 f 1 2 1 1 2 X 85 83 82 79 78 f 2 1 3 1 1 X 77 75 71 70 68 f 1 1 1 1 1 X 66 60 57 50 f 1 1 1 1 Mode = 82. John M Dusel (Whittier College) Central Tendency Fall 2015 14 / 18 When to use the Mean, Median, or Mode Scales of Measurement Median requires order, and mean requires quantitative relationships between the scores. Nominal: Mode. Meaningful question: Most frequently occurring area code. Meaningless question: What is the mean or median area code? Ordinal: Mode or median. Meaningful questions: Most frequently occurring class standing (freshman) or middle class standing (sophomore) at Whittier. Meaningless question: What is the mean class standing? Interval: Mode or median or mean. Ratio: Mode or median or mean. John M Dusel (Whittier College) Central Tendency Fall 2015 15 / 18 When to use the Mean, Median, or Mode Skewed Distributions Mean is affected by extreme values in a distribution. Median less so. Example (Florida housing development) 03-W4221 4/4/07 10:48 AM Page 48 X = elevation of lot (ft above sea level). f 20 30 30 20 Chapter 3 20 Lots fX 7000 450 300 100 350 Elevation X 348–352 13–17 7–12 3–7 ■ SW ACRAMPY ES 48 80 Lots 25 5 F I G U R E 3 . 1 Elevation of Swampy Acres Positively skewed! Flood water level 25 feet =⇒ 80 lots underwater! John M Dusel (Whittier College) Central them by explaining that the average elevation of the lots is 78.5 feet and that the water level has never exceeded 25 feet in that area. On the average, the developer has told the truth, but this average truth is misleading. Look at the actual lay of the land in Figure 3.1 and examine the frequency distribution in Table 3.4. The mean elevation, as the developer said, is 78.5 feet; however, only 20 lots, all on a hill, are out of the flood zone. The other 80 lots are, on the average, under water. Using the mean is misleading. The median is a much better measure of central tendency here because it is unaffected by the few extreme lots on the hill. The median elevation is 12.5 feet, well below the high-water mark. (Because our interest is in only this one development, the lot elevations constitute a population of data. There is no interest in generalizing from these data to some larger group.) In summary, use the mean if it is appropriate. To follow this advice you must Tendency 2015 recognize data for which the mean is not appropriate. Perhaps TableFall 3.5 will help. 16 / 18 When to use the Mean, Median, or Mode Skewed Distributions Mean is affected by extreme values in a distribution. Median less so. Example (Florida housing development) X = elevation of lot (ft above sea level). X 348–352 13–17 7–12 3–7 f 20 30 30 20 N = 100 fX 7000 450 300 100 P fX = 7850 Positively skewed! Flood water level 25 feet =⇒ 80 lots underwater! µ = 7850/100 = 78.5 ft looks safe (misleading) Median = 12.5 ft more accurate description of reality Mean not recommended for skewed distributions! John M Dusel (Whittier College) Central Tendency Fall 2015 17 / 18 distribution to be positively skewed. If the mean is smaller than the median, expect When to skew. use theFigure Mean, 3.2 Median, Mode SkewedofDistributions negative showsor the relationship the mean to the median for a positively skewed and a negatively skewed distribution of continuous data. The size of the difference between the mean and median usually indicates the degree of skew. The greater the difference, the greater the skew. To illustrate, I added an expenditure of $100.00 to the slightly skewed distribution in Table 3.1. The original mean was $2.69 and the median was $2.45. Adding a score of $100.00 produces a much more skewed distribution with a mean of $9.83 and a median of $2.50. In the original distribution the difference between the mean and median was $0.24; in the Mean is affected by extreme values in a distribution. Median less so. Median Mean smaller than median—skew is negative Median Narrow point Low Mean larger than median—skew is positive Mean Frequency Frequency Mean High Narrow point Low Scores High Scores F I G U R E 3 . 2 The effect of skewness on the relative position of the mean and median for continuous data John M Dusel (Whittier College) Central Tendency Fall 2015 18 / 18