Summary Descriptive Measures Percent Location is an indicator of where the data is located. 10 0 90 80 70 60 50 40 30 20 35 30 25 20 15 10 5 0 10 Percentage Projects Completed Early Projects Completed Early 40 30 % 20 10 Plant B 0 10 15 20 25 30 35 40 45 50 Percent Scale is a measure of how “spread out” data is. Plant A Criteria for Measures of Location and Scale Must be well defined for: Raw Data Grouped Data Theoretical Curves For Business Purposes: Must be arithmetic Measures of Location Mode Simply the most frequent value in a data set. Problems: Raw Data: Many data sets have no repeat values, therefore mode does not exist. Mode is taken as midpoint of the bin with the greatest frequency. But consider the data discussed in the last lecture. Histogram of Labor Costs 30 Frequency 25 20 15 10 5 0 20 30 40 50 60 70 80 Labor Cost Histogram of Labor Costs 35 30 Frequency Grouped Data: 25 20 15 10 5 0 25 35 45 55 Labor Costs 65 75 Theoretical Data: Mode may not exist; consider the theoretical distribution of random numbers which should look like: x= random number 1 0. 9 0. 8 0. 7 0. 6 0. 5 0. 4 0. 3 0. 2 0. 1 1.2 1 0.8 0.6 0.4 0.2 0 0 f(x) Uniform Density Function Measures of Location Median The median is that data value which has approximately the same percentage of observations below it as above it (for large data sets this proportion will approach 50%). The word “median” comes from the Latin word “medius”, meaning “middle”. Raw Data: Finding the median from raw data is a two step process. First you must put the data in order, then you need to find the middle value. Example: Data = 3, -1, 6, 10, 11 Ordered Data = -1, 3, 6, 10, 11 Median = 6 If sample size is odd then median will be the value occupying position (n+1)/2 in the ordered data. Example: Data = 3, -1, 6, 10, 11, 7 Ordered Data= -1, 3, 6, 7, 10, 11 Median = any value between 6 and 7. Usually average two points to get 6.5 . If sample size is even then median is the arithmetic average of the values occupying positions (n/2) and (n/2) +1 in the ordered data. Notice: Median is not computed, it is found. For example replace the value of 11 in the above example by 12,000. The median remains 6.5 Cannot be manipulated algebraically. Finding the Median of Raw Data Using EXCEL Open the file “thickdat.xls” in the MBA Mod 1 folder. Find an empty cell and type in =median( Then highlight the range of the data. You should see something that looks like the following: Finally, type in the right parenthesis. The result is 355 which is the average of the 30th and 31st values, both of which happen to be 355. Finding the Median from Grouped Data Suppose you did not have the raw data for steel thickness, but only had the data grouped as shown below: Interval 341.5 344.5 347.5 350.5 353.5 356.5 359.5 362.5 344.5 347.5 350.5 353.5 356.5 359.5 362.5 365.5 m(i) Midpoint f(i) Freq F 343 346 349 352 355 358 361 364 1 3 8 8 20 13 5 2 1 4 12 20 40 53 58 60 Using the column labeled “F”, it is clear that the 30th and 31st observations lie in the interval [353.5 to 356.5]. Altogether there are 20 observations in the interval [353.5 to 356.5]. Since there are 20 observations below 353.5, we need 10 more to get to the 30th value. ASSUMPTION: The data points in the interval are equi-spaced throughout the interval To get the 30th value, we need to go 10/20ths (or .5) into the interval. Since the bin is 3 units wide, we need to go a distance of (10/20)*3 = 1.5 into the interval. Therefore we estimate the 30th value as 353.5 + 1.5 = 355 To get the 31st value, we need to go 11/20ths (or .55) into the interval. Since the bin is 3 units wide, we need to go a distance of (11/20)*3 = 1.65 into the interval. Therefore we estimate the 31st value as 353.5 + 1.65 = 355.15. The median is estimated as median = (355 + 355.15)/2 = 355.075. Finding the Median From Theoretical Probability Distributions If f(x) is the probability density function of x, the median is that value med satisfying the integral equation: med f ( x)dx .5 Problems with the Median Suppose you had two groups of people. In Group 1 you had 50 people with a median hourly wage of $15.00 per hour. In Group 2 you had 100 people with a median hourly wage of $17.00 per hour. Given this information can you determine the median hourly wage of all 150 people? Consider the following data: Time 1 median Time 2 change 5 10 15 20 25 4 12 18 19 23 -1 2 3 -1 -2 15 18 -1 Change in median is 18 15 =3 Median Change is -1