DESCRIBING DATA: NUMERICAL MEASURES Dr. Ashish Chandra College of Business Chapter Three Objectives 1 Compute and interpret the mean, the median, and the mode Ways to Measure the central location of Data A measure of location is a value used to describe the central tendency of a set of data Common measures of location Mean Median Mode The mean is the most widely reported measure of location Measure the central location of Data - via MEAN Mean: Two ways to compute (1) Population Mean (2) Sample Mean (1) Population Mean – Property of the population (Average of all data points) Definition – PARAMETER: Any characteristic of a population. (Population Mean is a parameter) Measure the central location of Data - via MEAN Mean: Two ways to compute (1) Population Mean (2) Sample Mean (1) Population Mean – Property of the population (Average of all data points) Question: What type of statistics are we doing, when computing population mean – Recall Chapter 1 Measure the central location of Data - via MEAN Mean: Two ways to compute (1) Population Mean (2) Sample Mean (1) Population Mean – Property of the population (Average of all data points) Question: What type of statistics are we doing, when computing population mean – Recall Chapter 1 Descriptive Statistics Example: Population Mean There are 42 exits on I-75 through the state of Kentucky. Listed below are the distances between exits (in miles). 1. Why is this information a population? Number of data points in table = 42, we consider all exits on I-75 of Kentucky 2. What is the mean number of miles between exits? Interpretation: 4.57 is the typical/ average number of miles between exits. (Population parameter) Sample Mean STATISTIC - A characteristic of a sample. Any measurable characteristic/ property of a sample is a called a statistic Sample Mean is a statistic Sample Mean STATISTIC A characteristic of a sample. What type of statistics are we doing, when computing sample mean – Recall Chapter 1 Sample Mean STATISTIC A characteristic of a sample. What type of statistics are we doing, when computing sample mean – Recall Chapter 1 Inferential Statistic Example: Sample Mean What is the sample mean number of hours for mobile usage Properties of the Population Mean All the data values are used in the calculation of population mean The population mean is unique The sum of the deviations from the mean (population/ sample) is zero Weakness: Affected by extreme values. The Median MEDIAN The midpoint of the values after they have been ordered from the minimum to the maximum values or maximum to minimum values. Value in the middle of a set of ordered data Describes the center of the data sets where there are one or just a few extreme values Odd number of data points above Finding the Median To find the median for an even numbered data set, sort the observations and calculate the average of the two middle values The number of hours a sample of 10 adults used Facebook last month: 3 5 7 5 9 1 3 9 17 10 Arranging the data in ascending order gives: 1 3 3 5 5 7 9 9 10 17 Thus, the median is 6. Characteristics of the Median The median is the value in the middle of a set of ordered data It is not influenced by extreme values Cases where mean is not representative of the data, use the median. Fifty percent of the observations are larger than the median It is unique to a set of data The Mode MODE The value of the observation that occurs most frequently. Given Data: 1 3 3 5 5 7 9 9 10 17 Construct frequency table Identify the entry/entries with the maximum appearance A set of data can have more than one mode A set of data could have no mode 1 3 5 7 9 10 17 Entry 1 3 5 7 9 10 17 Frequency 1 1 1 1 1 1 1 Frequency Tennis 6 Soccer 7 Baseball 3 Frequency 1 1 3 2 5 2 7 1 9 2 10 1 17 1 More than one mode: 3,5,9 NO mode as no entry is repeated Sports Entry Mode: Soccer Objective 2 We studied Mean, Median, Mode: Measure the central tendency of the data set We will study Dispersion: Measures variation or spread in the data set Why study Dispersion? Dispersion: Measures the variation or spread in the data set Mean in both the cases is 50 In case one data is LESS spread than the second Less spread = Data is closely clustered around the mean (Hence, mean is a good representative of the data) Large spread = Data points scattered more (Hence, mean may not be a good representative of the data) Ways to measure Dispersion 1) Range 2) Variance 3) Standard Deviation 1) Range = Maximum value – Minimum value Only 2 data values are used for its computation Super quick to compute Keeps account of the extreme values Variance: Population Variance 2 Population Variance (𝜎 ): 2 ∑ x − μ σ2 = N 𝜎 2 = Population variance (𝜎 Greek letter “sigma”). Read as “sigma raised to power 2” (sigma squared) x = Value of observations in the population 𝜇 = Population mean N = Number of observations in the population Properties of Population variance All observations are used in the calculation Unit of population variance – Think about the unit of Mean, Median, Mode, and Range? Understanding unit of population variance Unit of Mean, Median, Mode, and Range : centimeters Unit of Variance : centimeters2 ∑ 100𝑐𝑚 −50𝑐𝑚 2 2 = 𝑐𝑚 100,000 Computing Population Variance Let the number of traffic citations issued last year by month in Normal is reported below. Population variance? Step 1: Compute the population mean Step 2: Compute the following table Step 3: Sum it up Population variance for the number of citations is 124 Population Standard Deviation (σ) 2 ∑ x − μ Population Variance (𝜎 2 ): σ2 = N Population Standard Deviation (𝜎): Important points to Note: Unit of Population variance (𝜎 2 ): centimeter2 Unit of Population Standard Deviation (𝜎): centimeter Population Standard Deviation and Population Variance can not be negative Population Variance: Average Squared distance of the observation from the population mean μ Population Standard Deviation: Square root of the average Squared distance of the observation from μ Population vs Sample Population Variance: ∑ x−μ 2 2 σ = N Population Standard Deviation (𝜎): Sample Variance: s 2 = ∑ x−x 2 n −1 Sample Standard Deviation (s): μ ∶ Population Mean x ∶ Sample Mean obtained from the Sample N: Number of observations in population n: Number of observations in the Sample x: Observations from the POPULATION x: Observations from the chosen SAMPLE Units of Sample Variance & Population Variance ? Units of Sample Standard Deviation & Population Standard Deviation ? Interpretation of Standard Deviation THE EMPIRICAL RULE (Sample or Population): ONLY for a symmetrical, bell-shaped frequency distribution, approximately 68% of the observations (sample / population) will lie within plus and minus one standard deviation of the mean, about 95% of the observations will lie within plus or minus 2 standard deviations of the mean, and practically all (99.7%) will lie within 3 standard deviations of the mean. • Symmetrical • Bell Shaped Say Mean (sample/ population) = 100 Standard Deviation = 10 68% of the observations lie in [Mean – St. Deviation, Mean + St. Deviation] =[100 – 10, 100 + 10] 95% of the observations lie in [Mean – 2 x St. Deviation, Mean + 2 x St. Deviation] =[100 – 2 x 10, 100 + 2 x 10] = [80, 120] 99.7% of the observations lie in [Mean – 3 x St. Deviation, Mean + 3 x St. Deviation] =[100 – 3 x 10, 100 + 3 x 10] = [70, 130] How is the above analysis useful if you have a Symmetrical, bell-shaped frequency distribution with 10 Billion observation? Interpretation of Standard Deviation What to do when we have a NON-symmetrical, and NON-bell-shaped frequency distribution Example: Dupree Paint Company employees contribute a mean of $51.54 to the company’s profit-sharing plan every two weeks. The standard deviation of biweekly contributions is $7.51. At least what percent of the contributions lie within plus 3.5 standard deviations and minus 3.5 standard deviations of the mean, that is, between $25.26 and $77.83?