Numerical Descriptive Measures Definitions The central tendency locates the central value in a data set. The variation measures how close to the centre or how dispersed (scattered) the observations are from the centre. The shape is the pattern of the distribution of values from the lowest value to the highest value. Describing Data Numerically Describing Data Numerically Central Tendency Dispersion Arithmetic Mean Range Median Interquartile Range Mode Variance Standard Deviation Coefficient of Variation Measures of Central Tendency Calculating the Mean, Median and Mode Measures of Central Tendency Purpose: To determine the “centre” of the data values. The Mean The mean is also known as the average. Calculating the Sample Mean from raw data Pronounced x-bar The ith observation (values taken by x) n x x i 1 i n Sample size = number of observations Example 1 The number of work days lost due to illness in a business per week is given below (for a 10 week period) 36, 28, 33, 29, 28, 32, 33, 33, 34, 32 Calculate mean number of days lost per week during the above period. n Sample mean, x i 1 i n x1 x2 x3 ... xn n 36 28 33 ... 32 10 318 10 31.8 Exercise 1 The following are the ages (in years) of all eight employees of a small company 53, 32, 61, 27, 39, 44, 49, 57 Find the mean age of these employees. 45.25 years Properties of the Sample Mean Uniqueness ‐‐ For a given set of data there is one and only one mean. Affected (distorted) by extreme values (outliers) 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 1 2 3 4 5 15 3 5 5 0 1 2 3 4 5 6 7 8 9 10 Mean = 4 1 2 3 4 10 20 4 5 5 Properties of the Sample Mean May better be replaced by the median when the distribution of the data is ‘skewed’). An important property of the mean is that it includes every value in your data set as part of the calculation. The Median The median is the value of the middle observation in a dataset. Calculating the Median from raw data Step 1: First, arrange the observations in ascending order Step 2: Then, find the middle position, using the following formula if n is an odd number. n 1 Median position 2 Step 3: The median value is in the median position Example 1 Find the median for the following data set. 27 38 12 34 42 40 24 40 23 The ordered set becomes Observation 12 23 24 27 34 38 40 40 42 Rank 1 2 3 4 5 6 7 8 9 1 th The median position is 5 rank (observation) 2 Therefore the median = 34 9 Exercise 1 Sambiri Silicon manufactures computer monitors. The following data are numbers of computer monitors produced at the company for a sample of 10 days. Find the median. 24 31 27 25 35 33 26 40 25 28 Properties of the Median In an ordered array, the median is the “middle” number (50% above, 50% below) Uniqueness -- There is only one median for each set of data. Not affected by extreme values 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Median = 3 Median = 3 The Mode The mode is the most frequently occurring value in a dataset. Calculating the Mode from raw data Step 1: First, arrange the observations in ascending order Step 2: The mode is the most frequently occurring value in the dataset. Example 1 Find the mode for the data below 7.00 19.00 23.00 34.22 11.00 14.25 15.00 15.00 15.50 19.00 19.00 19.00 21.00 22.00 24.00 25.00 27.00 27.00 28.00 43.25 The mode is 19.00 because it recurs the most times, i.e. four (4) times Properties of the Mode Normally, the mode is used for categorical data where we wish to know which is the most common category Not affected by extreme values The mode is not unique 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 9 0 1 2 3 4 5 6 No Mode Properties of the Mode There can be one mode There can be several modes We are now stuck as to which mode best describes the central tendency of the data. This is particularly problematic when we have continuous data because we are more likely not to have any one value that is more frequent than the other. Properties of the Mode For example, consider measuring 30 peoples' weight (to the nearest 0.1 kg). How likely is it that we will find two or more people with exactly the same weight (e.g., 67.4 kg)? The answer, is probably very unlikely ‐ many people might be close, but with such a small sample (30 people) and a large range of possible weights, you are unlikely to find two people with exactly the same weight; that is, to the nearest 0.1 kg. This is why the mode is very rarely used with continuous data. Question When re‐ordering, the most common hat or jeans size is what you would like to know, not the average hat or jeans size. The Shape: Skewness The shape is the pattern of the distribution of values from the lowest value to the highest value. Symmetric Histogram Skewed Histogram Skewed Histogram Measures of skewness Pearson’s coefficient Bowley’s coefficient (Galton’s coefficient) Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Ch ap 331 Measures of Central Tendency: Summary Central Tendency Sample Mean Median Mode n X X i1 n Geometric Mean XG ( X1 X2 Xn )1/ n i Middle value in the ordered array Most frequently observed value Rate of change of a variable over time Measures of Dispersion Measures of Dispersion Which dataset has the larger variation? Dataset 1 Dataset 2 Measures of Dispersion Population 1 Population 2 Narrow range Wide range Smaller variation Larger variation Smaller deviation Larger deviation Observations clustered Observations spread out Population 1 Population 2 Same centre, different variation Measures of Dispersion The measures of central tendency, the mean, median and mode, do not reveal the whole picture of the distribution of the dataset. Two datasets with the same mean may have completely different spreads. The amount or degree of spread is known as variation. Measures of Dispersion Variation Range Variance Standard Deviation Coefficient of Variation Measures of variation give information on the spread or variability or dispersion of the data values. Same centre, different variation Measures of Dispersion: The Range Range = Xlargest – Xsmallest Example: 0 1 2 3 4 5 6 7 8 9 10 11 12 Range = 13 – 1 = 12 13 14 Measures of Dispersion: Why The Range Can Be Misleading Range 12 - 7 5 Range 12 - 7 5 Measures of Dispersion: Why The Range Can Be Misleading Ignores the way in which data are distributed 7 8 9 Range 10 12 - 7 11 5 12 7 8 9 Range 10 11 12 - 7 12 5 Measures of Dispersion: Why The Range Can Be Misleading Range Range 5-1 120 - 1 4 119 Measures of Dispersion: Why The Range Can Be Misleading Sensitive to outliers 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 Range 5-1 4 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range 120 - 1 119 The Sample Variance Variance is used to measure the dispersion of values relative to the mean. n s 2 Where (x i1 n x) 2 i n1 xi 2 nx i 1 n1 X = arithmetic mean n = sample size Xi = ith observation of the variable X 2 The Sample Standard Deviation Most commonly used measure of variation Tells us how much observations in our sample differ from the mean value within our sample. Has the same units as the original data making it easier to interpret. s s 2 Example For this sample data Xi: 2, 3, 5, 1, 4, 3, 2, 4 find. Sample variance 2. Sample standard deviation 1. The variation or dispersion in a set of values refers to how spread out the values are from each other. • The variation is small when the values are close together. • There is no variation if the values are the same. Smaller variation Larger variation The Coefficient of Variation The variance and the standard deviation are useful as measures of variation of the values of a single variable for a single population (or sample). If we want to compare the variation of two variables we cannot use the variance or the standard deviation because: 1. The variables might have different means. 2. The variables might have different units. The Coefficient of Variation Measures relative variation to the mean Expressed as a percentage (%) s CV = ×100% x The Coefficient of Variation The coefficient of variation compares the variability of two different datasets even if they have different units of measurement. Example 1 Spot, the dog, weighs 65 pounds. Spot’s weight fluctuates 5 pounds depending on Spot’s exercise level. Sea Biscuit, the horse, weighs 1200 pounds. Sea Biscuit’s weight fluctuates 125 pounds depending on the number of rides Sea Biscuit goes on. Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Ch ap 352 Coefficient of Variation Some financial investors use the coefficient of variation as a measure of risk. What does the Coefficient of Variation tell us about the risk of a stock that the standard deviation does not? Relative to the amount invested in a stock, the coefficient of variation reveals the risk of a stock in terms of the size of the standard deviation relative to the size of the mean (in percentage). Example 2 Relative to the amount of money invested in the stock, which stock, A or B, is riskier? Stock A Stock B Average price $50 $100 Standard deviation $5 $5 Comparing Coefficients of Variation s 5 CVA 100% 100% 10% 50 x s 5 CVB 100% 100% 5% 100 x Comparing the C.V. it is clear that variation is much higher stock A than in stock B. Example 3 The yearly salaries of all employees who work for a company have a mean of $62,350 and a standard deviation of $6820. The years of experience for the same employees have a mean of 15 years and a standard deviation of 2 years. Is the relative variation in the salaries larger or smaller than that in the years of experience for these employees? Interpretation A low (%) value shows low variability implying tight clustering of observations about the mean. A middle to high (%) value shows high variability implying that observations are widely spread. Measures of Position for ungrouped data (Quartile Measures) Quartile Measures Quartiles split the ranked data into 4 equal segments. 25% 25% Q1 25% Q2 25% Q3 The first quartile(lower quartile), Q1, below the first are 25% of the observations. Q2 is the same as the median (middle quartile)and hence below the second quartile are 50% of the observations. The third quartile(upper quartile), Q3, below the third quartile are 75% of the observations. Quartile Measures Q1 = 25th percentile = P25 Q2 = 50th percentile = P50 Q3 = 75th percentile = P75 Locating Quartiles Positions Step 1: First, arrange the observations in ascending order Step 2: Find the quartile positions using the following formulas. Q1 position 0.25 n 1 Q 2 position 0.5 n 1 Q3 position 0.75 n 1 Step 3: Determine the quartile values. The Interquartile Range (IQR) Remember that the range can be distorted by outliers. The IQR excludes these outliers and focuses on the spread of the middle 50% of the data values. The IQR is also called the 50% mid‐spread range. IQR Q3 Q1 The Interquartile Range (IQR) Weakness The IQR, like the range, also provides no information on the clustering of observations within the dataset as it uses only two observations in its computation. Example 1 Given Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22 Find 1. Q1 and Q3 2. IQR Locating First quartile, Q1 11 12 13 16 16 17 18 21 22 (n = 9) Q1 is in the 0.25(9+1)=2.5 th position of the ranked data so use the value half way between the 2nd and 3rd values 12 13 13 12 Q1 12.5 or Q 1 12 12.5 2 2 Locating Third Quartile, Q3 11 12 13 16 16 17 18 21 22 (n = 9) Q3 is in the 0.75(9+1)=7.5 th position of the ranked data so use the value half way between the 7th and 8th values. 18 21 21 18 Q3 19.5 or Q 3 18 19.5 2 2 The Interquartile Range (IQR) IQR Q3 Q1 19.5 12.5 7.0 Example 2 Given Sample Data in Ordered Array: 7 8 9 10 11 12 13 13 14 17 17 45 Find 1. Q1 and Q3 2. IQR Locating First quartile, Q1 7 8 9 10 11 12 13 13 14 17 17 45 (n 12) Q1 is in the 0.2512 1 3.25 pos of the ranked data. So find the value half way between the 3rd and 4th values, 9 10 which is 9.5 2 9 9.5 10 9 Q1 9.25 or Q 1 9 9.25 2 4 Locating Third Quartile, Q3 7 8 9 10 11 12 13 13 14 17 17 45 (n 12) Q3 is in the 0.7512 1 9.75 pos of the ranked data. So find the value half way between the 9th and 10th values, 14 17 which is 15.5 2 15.5 17 17 14 Q3 16.25 or Q 3 17 16.25 2 4 The Interquartile Range (IQR) IQR Q3 Q1 16.25 9.25 7.0 End of Chapter Grouped data Mean Variance CV Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Ch ap 375